Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in dynamic read disqualification fixed #8171

Conversation

ilyasoifer
Copy link
Collaborator

No description provided.

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor change but otherwise this looks good (or not so good that we missed it...)

return read -> {
final double maxErrorsForRead = capLikelihoods ? Math.max(MAX_ERRORS_FOR_READ_CAP, Math.ceil(read.getLength() * expectedErrorRate)) : Math.ceil(read.getLength() * expectedErrorRate);
final double maxCatastrophicErrorsForRead = capLikelihoods ? Math.max(MAX_CATASTROPHIC_ERRORS_FOR_READ_CAP, Math.ceil(read.getLength() * catastrophicErrorRate)) : Math.ceil(read.getLength() * catastrophicErrorRate);
return maxErrorsForRead * log10ErrorRate + maxCatastrophicErrorsForRead * catastrophicErrorRate;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that looks like a bug doesn't it...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make sure to apply this to the FlowBasedHMMEngine as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, looks like FlowBasedHMMEngine was good to begin with!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like FlowBasedHMMEngine was good

public ToDoubleFunction<GATKRead> log10MinTrueLikelihood(final double expectedErrorRate, final boolean capLikelihoods) {
        final double log10ErrorRate = Math.log10(expectedErrorRate);
        final double catastrophicErrorRate = Math.log10(fbargs.fillingValue);

        return read -> {
            final double maxErrorsForRead = Math.max(3.0, Math.ceil(read.getLength() * expectedErrorRate));
            final double maxCatastrophicErrorsForRead = Math.max(2.0, Math.ceil(read.getLength() * catastrophicErrorRate));
            return maxErrorsForRead * log10ErrorRate + maxCatastrophicErrorsForRead*catastrophicErrorRate;
        };
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure thats fixed... you are still multiplying Math.ceil(read.getLength() * catastrophicErrorRate)) where catestrophicErrorRate is in log10. I think you have to make the same log change you made for the FlowBasedAligner (indeed we should proablby have factored this better in the first place to reuse the code better but thats neither here nor there)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, will do!

@ilyasoifer ilyasoifer force-pushed the ilyasoifer/bioin-772-fix.read.disqualification.bug branch from caf255c to 27f8716 Compare January 25, 2023 12:28
@ilyasoifer
Copy link
Collaborator Author

@jamesemery - seems that it was only my bug, your code was good!

@ilyasoifer
Copy link
Collaborator Author

@meganshand - FYI a small bug fix and there is another one (in t0 parsing) to follow. This one has a very tiny positive effect on indel performance

@@ -23,7 +23,7 @@
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##INFO=<ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of 2x2 contingency table to detect strand bias">
##INFO=<ID=SUSP_NOISY_ADJACENT_TP_VARIANT,Number=0,Type=Flag,Description="Indicates a locus where false positive allele might be affecting a true positive allele">
##INFO=<ID=XC,Number=1,Type=Integer,Description="Indicates collapsing took place">
##INFO=<ID=XC,Number=1,Type=Integer,Description="Indicates longer hmer collapsing took place (this is a flow-based specific tag)">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this change happen in a separate PR?

@ilyasoifer
Copy link
Collaborator Author

@jamesemery PTAL

@ilyasoifer ilyasoifer merged commit a3c1d2c into broadinstitute:master Jan 28, 2023
@ilyasoifer ilyasoifer deleted the ilyasoifer/bioin-772-fix.read.disqualification.bug branch January 28, 2023 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants