Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding argument to GenotypeGVCFs to keep only RAW_GT_COUNT #7996

Merged
merged 13 commits into from
Oct 24, 2022
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
package org.broadinstitute.hellbender.cmdline.GATKPlugin;

import com.google.common.collect.Lists;
import org.broadinstitute.barclay.argparser.Advanced;
import org.broadinstitute.barclay.argparser.Argument;
import org.broadinstitute.hellbender.cmdline.StandardArgumentDefinitions;
Expand Down Expand Up @@ -82,4 +81,7 @@ public boolean getDisableToolDefaultAnnotations() {
public boolean getEnableAllAnnotations() {
return enableAllAnnotations;
}

@Override
public List<String> getKeepSpecifiedCombinedAnnotationNames() { return Collections.emptyList(); }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the new argument is specific to GenotypeGVCFs and GenotypeGVCFsAnnotationArgumentCollection, we definitely don't want to add this method here. It should only be in GenotypeGVCFsAnnotationArgumentCollection. See my comments on GenotyeGVCFs elsewhere for more details on how to make this unnecessary.

}
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,7 @@ public abstract class GATKAnnotationArgumentCollection implements Serializable {
* Returns {@code true} if all annotations are enabled; {@code false} otherwise.
*/
public abstract boolean getEnableAllAnnotations();


public abstract List<String> getKeepSpecifiedCombinedAnnotationNames();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this can be removed.

}
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
import org.broadinstitute.hellbender.utils.config.ConfigFactory;
import org.broadinstitute.hellbender.utils.config.GATKConfig;

import java.io.File;
import java.lang.reflect.Modifier;
import java.util.*;
import java.util.stream.Collectors;
Expand Down Expand Up @@ -497,4 +496,12 @@ public Class<?> getClassForPluginHelp(final String pluginName) {
return allDiscoveredAnnotations.containsKey(pluginName) ? allDiscoveredAnnotations.get(pluginName).getClass() : null;
}

public Map<String,Annotation> getAllDiscoveredAnnotations() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep this method, I'd suggest:

  • moving it so its adjacent to getResolvedInstances
  • renaming it to getDiscoveredInstances
  • adding a javadoc comment saying that it returns every annotation that is discovered anywhere on the classpath, and that it should only be used when getResolvedInstances isn't sufficient

return this.allDiscoveredAnnotations;
}

public GATKAnnotationArgumentCollection getUserArgs() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getUserArgs can also be removed from this class (see the comments elsewhere).

return this.userArgs;
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
import htsjdk.variant.vcf.VCFHeaderLine;
import org.broadinstitute.barclay.argparser.*;
import org.broadinstitute.barclay.help.DocumentedFeature;
import org.broadinstitute.hellbender.cmdline.GATKPlugin.GATKAnnotationPluginDescriptor;
import org.broadinstitute.hellbender.cmdline.GATKPlugin.GATKReadFilterPluginDescriptor;
import org.broadinstitute.hellbender.cmdline.StandardArgumentDefinitions;
import org.broadinstitute.hellbender.cmdline.argumentcollections.DbsnpArgumentCollection;
import org.broadinstitute.hellbender.cmdline.programgroups.ShortVariantDiscoveryProgramGroup;
Expand All @@ -28,6 +30,7 @@
import org.broadinstitute.hellbender.tools.walkers.mutect.M2ArgumentCollection;
import org.broadinstitute.hellbender.utils.*;
import org.broadinstitute.hellbender.utils.variant.GATKVariantContextUtils;
import org.broadinstitute.hellbender.tools.walkers.annotator.allelespecific.ReducibleAnnotation;

import java.util.*;
import java.util.stream.Collectors;
Expand Down Expand Up @@ -132,17 +135,17 @@ public final class GenotypeGVCFs extends VariantLocusWalker {
doc = "LOD threshold to emit variant to VCF.")
protected double tlodThreshold = 3.5; //allow for some lower quality variants


/**
* Margin of error in allele fraction to consider a somatic variant homoplasmic, i.e. if there is less than a 0.1% reference allele fraction, those reads are likely errors
*/
@Argument(fullName=CombineGVCFs.ALLELE_FRACTION_DELTA_LONG_NAME, doc = "Margin of error in allele fraction to consider a somatic variant homoplasmic")
protected double afTolerance = 1e-3; //based on Q30 as a "good" base quality score

/**
* If specified, keep the combined raw annotations (e.g. AS_SB_TABLE) after genotyping. This is applicable to Allele-Specific annotations
* If specified, keep all the combined raw annotations (e.g. AS_SB_TABLE) after genotyping. This is applicable to Allele-Specific annotations. See {@link ReducibleAnnotation}
*/
@Argument(fullName=KEEP_COMBINED_LONG_NAME, shortName = KEEP_COMBINED_SHORT_NAME, doc = "If specified, keep the combined raw annotations")
@Argument(fullName=KEEP_COMBINED_LONG_NAME, shortName = KEEP_COMBINED_SHORT_NAME, doc = "If specified, keep the combined raw annotations",
mutex = {GenotypeGVCFsAnnotationArgumentCollection.KEEP_SPECIFIED_RAW_COMBINED_ANNOTATION_LONG_NAME})
protected boolean keepCombined = false;

@ArgumentCollection
Expand Down Expand Up @@ -221,6 +224,16 @@ protected GenomicsDBOptions getGenomicsDBOptions() {
@Override
public boolean useVariantAnnotations() { return true;}

@Override
public List<? extends CommandLinePluginDescriptor<?>> getPluginDescriptors() {
GATKReadFilterPluginDescriptor readFilterDescriptor = new GATKReadFilterPluginDescriptor(getDefaultReadFilters());
return useVariantAnnotations()?
Arrays.asList(readFilterDescriptor, new GATKAnnotationPluginDescriptor(
new GenotypeGVCFsAnnotationArgumentCollection(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the branch I gave you, I had instantiated GenotypeGVCFsAnnotationArgumentCollection directly like this, but it should really be a GenotypeGVCFs instance variable. Then you can just pass the instance variable to the annotation plugin here, and reference it below when you need to get call getKeepSpecifiedCombinedAnnotationNames.

Copy link
Collaborator

@cmnbroad cmnbroad Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, make sure that when you add the instance variable, you don't annotate it as an @ArgumentCollection (and maybe even add a comment saying that @ArgumentCollection is left out on purpose) since if you do the parser will complain about duplicate arg names. i.e., just add:

// @ArgumentCollection deliberately omitted since this is passed to the annotation plugin
final GenotypeGVCFsAnnotationArgumentCollection genotypeGVCFsAnnotationArgs = new GenotypeGVCFsAnnotationArgumentCollection();

getDefaultVariantAnnotations(), getDefaultVariantAnnotationGroups())):
Collections.singletonList(readFilterDescriptor);
}

@Override
public List<Class<? extends Annotation>> getDefaultVariantAnnotationGroups() {
return Arrays.asList(StandardAnnotation.class);
Expand Down Expand Up @@ -261,8 +274,12 @@ public void onTraversalStart() {
intervals = hasUserSuppliedIntervals() ? intervalArgumentCollection.getIntervals(getBestAvailableSequenceDictionary()) :
Collections.emptyList();

Collection<Annotation> variantAnnotations = makeVariantAnnotations();
annotationEngine = new VariantAnnotatorEngine(variantAnnotations, dbsnp.dbsnp, Collections.emptyList(), false, keepCombined);
final Collection<Annotation> variantAnnotations = makeVariantAnnotations();
final GATKAnnotationPluginDescriptor pluginDescriptor = getCommandLineParser().getPluginDescriptor(GATKAnnotationPluginDescriptor.class);
final List<String> annotationStringsToKeep = pluginDescriptor.getUserArgs().getKeepSpecifiedCombinedAnnotationNames();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add an instance variable of type GenotypeGVCFsAnnotationArgumentCollection called genotypeGVCFsAnnotationArgs (or something) as suggested above, then you can access that directly here, instead of asking the plugin descriptor for it. It will have it's actual type (as opposed to the base class type GATKAnnotationArgumentCollection), and you'll be able to call the GATKAnnotationArgumentCollection::getKeepSpecifiedCombinedAnnotationNames method on it directly. Then the getKeepSpecifiedCombinedAnnotationNames method can be removed from the GATKAnnotationArgumentCollection base class; and getUserArgs can be removed from the plugin descriptor. Only GenotypeGVCF's will know anything about the new arg collection or the new method:

Suggested change
final Collection<Annotation> variantAnnotations = makeVariantAnnotations();
final GATKAnnotationPluginDescriptor pluginDescriptor = getCommandLineParser().getPluginDescriptor(GATKAnnotationPluginDescriptor.class);
final List<String> annotationStringsToKeep = pluginDescriptor.getUserArgs().getKeepSpecifiedCombinedAnnotationNames();
final Collection<Annotation> variantAnnotations = makeVariantAnnotations();
final GATKAnnotationPluginDescriptor pluginDescriptor = getCommandLineParser().getPluginDescriptor(GATKAnnotationPluginDescriptor.class);
final List<String> annotationStringsToKeep = genotypeGVCFsAnnotationArgs.getKeepSpecifiedCombinedAnnotationNames();

final Map<String, Annotation> allDiscoveredAnnotations = pluginDescriptor.getAllDiscoveredAnnotations();
final Set<Annotation> annotationsToKeep = annotationStringsToKeep.stream().map(allDiscoveredAnnotations::get).collect(Collectors.toSet());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be preferable to add an override of makeVariantAnnotations into this class (adjacent to the getPluginDescriptors override), and move the computation of which annotations to keep into that, rather than inlining it here.

Also, you're basically giving priority to the new arg in the case where it conflicts with, say, an exclude arg, which I think is fine if thats what you want, but you might want to document that as part of the new arg if thats what you go with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is I still need the regular list of annotations (in addition to the annotationsToKeep). I can still move it to a separate function though.

Also I think I resolved this by not using allDiscoveredAnnotations so now excluded args should get handled correctly.

annotationEngine = new VariantAnnotatorEngine(variantAnnotations, dbsnp.dbsnp, Collections.emptyList(), false, keepCombined, annotationsToKeep);

merger = new ReferenceConfidenceVariantContextMerger(annotationEngine, getHeaderForVariants(), somaticInput, false, true);

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
package org.broadinstitute.hellbender.tools.walkers;

import org.broadinstitute.barclay.argparser.Argument;
import org.broadinstitute.hellbender.cmdline.GATKPlugin.DefaultGATKVariantAnnotationArgumentCollection;
import org.broadinstitute.hellbender.tools.walkers.annotator.allelespecific.ReducibleAnnotation;

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class GenotypeGVCFsAnnotationArgumentCollection extends DefaultGATKVariantAnnotationArgumentCollection {
private static final long serialVersionUID = 1L;

public static final String KEEP_SPECIFIED_RAW_COMBINED_ANNOTATION_LONG_NAME = "keep-specific-combined-raw-annotation";
public static final String KEEP_SPECIFIED_RAW_COMBINED_ANNOTATION_SHORT_NAME = "keep-specific-combined";

/**
* Keep only the specific combined raw annotations specified. Cannot be used with --keep-combined-raw-annotations which saves all raw annotations.
* Duplicate values will be ignored. See {@link ReducibleAnnotation} for more information on raw annotations.
*/
@Argument(fullName= KEEP_SPECIFIED_RAW_COMBINED_ANNOTATION_LONG_NAME, shortName = KEEP_SPECIFIED_RAW_COMBINED_ANNOTATION_SHORT_NAME, optional = true,
mutex = {GenotypeGVCFs.KEEP_COMBINED_LONG_NAME},
doc="Keep only the specific combined raw annotations specified (removing the other raw annotations). Duplicate values will be ignored.")
protected List<String> keepSpecifiedCombined = new ArrayList<>();

@Override
public List<String> getKeepSpecifiedCombinedAnnotationNames() {return Collections.unmodifiableList(keepSpecifiedCombined);}
}
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,14 @@
import htsjdk.variant.vcf.*;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.broadinstitute.hellbender.cmdline.StandardArgumentDefinitions;
import org.broadinstitute.hellbender.engine.FeatureContext;
import org.broadinstitute.hellbender.engine.FeatureDataSource;
import org.broadinstitute.hellbender.engine.FeatureInput;
import org.broadinstitute.hellbender.engine.ReferenceContext;
import org.broadinstitute.hellbender.exceptions.GATKException;
import org.broadinstitute.hellbender.exceptions.UserException;
import org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFsAnnotationArgumentCollection;
import org.broadinstitute.hellbender.tools.walkers.annotator.allelespecific.ReducibleAnnotation;
import org.broadinstitute.hellbender.tools.walkers.annotator.allelespecific.ReducibleAnnotationData;
import org.broadinstitute.hellbender.utils.Utils;
Expand Down Expand Up @@ -43,6 +45,7 @@ public final class VariantAnnotatorEngine {
private boolean expressionAlleleConcordance;
private final boolean useRawAnnotations;
private final boolean keepRawCombinedAnnotations;
private final List<String> rawAnnotationsToKeep;

private final static Logger logger = LogManager.getLogger(VariantAnnotatorEngine.class);
private final static OneShotLogger jumboAnnotationsLogger = new OneShotLogger(VariantAnnotatorEngine.class);
Expand All @@ -59,17 +62,21 @@ public final class VariantAnnotatorEngine {
* @param useRaw When this is set to true, the annotation engine will call {@link ReducibleAnnotation#annotateRawData(ReferenceContext, VariantContext, AlleleLikelihoods)}
* on annotations that extend {@link ReducibleAnnotation}, instead of {@link InfoFieldAnnotation#annotate(ReferenceContext, VariantContext, AlleleLikelihoods)},
* @param keepCombined If true, retain the combined raw annotation values instead of removing them after finalizing
* @param rawAnnotationsToKeep List of raw annotations to keep even when others are removed
*/
public VariantAnnotatorEngine(final Collection<Annotation> annotationList,
final FeatureInput<VariantContext> dbSNPInput,
final List<FeatureInput<VariantContext>> featureInputs,
final boolean useRaw,
boolean keepCombined){
final boolean keepCombined,
final Collection<Annotation> rawAnnotationsToKeep){
Utils.nonNull(featureInputs, "comparisonFeatureInputs is null");
infoAnnotations = new ArrayList<>();
genotypeAnnotations = new ArrayList<>();
jumboInfoAnnotations = new ArrayList<>();
jumboGenotypeAnnotations = new ArrayList<>();
final List<String> variantAnnotationKeys = new ArrayList<>();
final List<String> rawVariantAnnotationKeysToKeep = new ArrayList<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other minor nit: why introduce this intermediate alias ? To me having the additional alias with a slightly different name just obfuscates whats going on - I'd suggest just allocating and using the rawAnnotationsToKeep instance variable directly, since this winds up being assigned there anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, good catch. I fixed this and removed variantAnnotationKeys since it wasn't being used anymore (after moving the error message to GenotypeGVCFs)

for (Annotation annot : annotationList) {
if (annot instanceof InfoFieldAnnotation) {
infoAnnotations.add((InfoFieldAnnotation) annot);
Expand All @@ -82,11 +89,19 @@ public VariantAnnotatorEngine(final Collection<Annotation> annotationList,
} else {
throw new GATKException.ShouldNeverReachHereException("Unexpected annotation type: " + annot.getClass().getName());
}
variantAnnotationKeys.addAll(((VariantAnnotation) annot).getKeyNames());
}
variantOverlapAnnotator = initializeOverlapAnnotator(dbSNPInput, featureInputs);
reducibleKeys = new LinkedHashSet<>();
useRawAnnotations = useRaw;
keepRawCombinedAnnotations = keepCombined;
for (final Annotation rawAnnot : rawAnnotationsToKeep) {
if (!annotationList.contains(rawAnnot)) {
throw new UserException("Requested --" + GenotypeGVCFsAnnotationArgumentCollection.KEEP_SPECIFIED_RAW_COMBINED_ANNOTATION_LONG_NAME + ": " + rawAnnot + " is not available. Add requested annotation with --" + StandardArgumentDefinitions.ANNOTATION_LONG_NAME + ".");
}
rawVariantAnnotationKeysToKeep.addAll(((VariantAnnotation) rawAnnot).getKeyNames());
}
this.rawAnnotationsToKeep = rawVariantAnnotationKeysToKeep;
for (InfoFieldAnnotation annot : infoAnnotations) {
if (annot instanceof ReducibleAnnotation) {
for (final String rawKey : ((ReducibleAnnotation) annot).getRawKeyNames()) {
Expand All @@ -96,6 +111,14 @@ public VariantAnnotatorEngine(final Collection<Annotation> annotationList,
}
}

public VariantAnnotatorEngine(final Collection<Annotation> annotationList,
final FeatureInput<VariantContext> dbSNPInput,
final List<FeatureInput<VariantContext>> featureInputs,
final boolean useRaw,
boolean keepCombined){
this(annotationList, dbSNPInput, featureInputs, useRaw, keepCombined, Collections.emptyList());
}

private VariantOverlapAnnotator initializeOverlapAnnotator(final FeatureInput<VariantContext> dbSNPInput, final List<FeatureInput<VariantContext>> featureInputs) {
final Map<FeatureInput<VariantContext>, String> overlaps = new LinkedHashMap<>();
for ( final FeatureInput<VariantContext> fi : featureInputs) {
Expand Down Expand Up @@ -253,6 +276,14 @@ public Map<String, Object> combineAnnotations(final List<Allele> allelesList, Ma
public VariantContext finalizeAnnotations(VariantContext vc, VariantContext originalVC) {
final Map<String, Object> variantAnnotations = new LinkedHashMap<>(vc.getAttributes());

//save annotations that have been requested to be kept
final Map<String, Object> savedRawAnnotations = new LinkedHashMap<>();
for(final String rawAnnot : rawAnnotationsToKeep) {
if (variantAnnotations.containsKey(rawAnnot)) {
savedRawAnnotations.put(rawAnnot, variantAnnotations.get(rawAnnot));
}
}

// go through all the requested info annotationTypes
for (final InfoFieldAnnotation annotationType : infoAnnotations) {
if (annotationType instanceof ReducibleAnnotation) {
Expand Down Expand Up @@ -280,6 +311,8 @@ public VariantContext finalizeAnnotations(VariantContext vc, VariantContext orig
variantAnnotations.remove(GATKVCFConstants.VARIANT_DEPTH_KEY);
variantAnnotations.remove(GATKVCFConstants.RAW_GENOTYPE_COUNT_KEY);
}
//add back raw annotations that have specifically been requested to keep
variantAnnotations.putAll(savedRawAnnotations);

// generate a new annotated VC
final VariantContextBuilder builder = new VariantContextBuilder(vc).attributes(variantAnnotations);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,21 @@
import htsjdk.variant.variantcontext.Allele;
import htsjdk.variant.variantcontext.VariantContext;
import htsjdk.variant.vcf.VCFCompoundHeaderLine;
import htsjdk.variant.vcf.VCFHeaderLine;
import htsjdk.variant.vcf.VCFInfoHeaderLine;
import org.broadinstitute.hellbender.engine.ReferenceContext;
import org.broadinstitute.hellbender.tools.walkers.annotator.Annotation;
import org.broadinstitute.hellbender.utils.genotyper.AlleleLikelihoods;
import org.broadinstitute.hellbender.utils.read.GATKRead;
import org.broadinstitute.hellbender.utils.variant.GATKVCFHeaderLines;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Map;

/**
* An interface for annotations that are calculated using raw data across samples, rather than the median (or median of median) of samples values
* The Raw annotation keeps some summary (one example might be a histogram of the raw values for each sample) of the individual sample (or allele)
* level annotation. As the annotations are combined across multiple samples the raw annotation continues to contain individual values while
* the final reduced annotation will typically be a summary statistic from these raw values.
*
*/
public interface ReducibleAnnotation extends Annotation {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ public static VCFFormatHeaderLine getEquivalentFormatHeaderLine(final String inf
addInfoLine(new VCFInfoHeaderLine(INBREEDING_COEFFICIENT_KEY, 1, VCFHeaderLineType.Float, "Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation"));
addInfoLine(new VCFInfoHeaderLine(AS_INBREEDING_COEFFICIENT_KEY, VCFHeaderLineCount.A, VCFHeaderLineType.Float, "Allele-specific inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation"));
addInfoLine(new VCFInfoHeaderLine(EXCESS_HET_KEY, 1, VCFHeaderLineType.Float, "Phred-scaled p-value for exact test of excess heterozygosity"));
addInfoLine(new VCFInfoHeaderLine(RAW_GENOTYPE_COUNT_KEY, 3, VCFHeaderLineType.Integer, "Counts of genotypes w.r.t. the reference allele: 0/0, 0/*, */*, i.e. all alts lumped together; for use in calculating excess heterozygosity"));
addInfoLine(new VCFInfoHeaderLine(RAW_GENOTYPE_COUNT_KEY, 3, VCFHeaderLineType.Integer, "Counts of genotypes w.r.t. the reference allele in the following order: 0/0, 0/*, */*, i.e. all alts lumped together; for use in calculating excess heterozygosity"));
addInfoLine(new VCFInfoHeaderLine(LIKELIHOOD_RANK_SUM_KEY, 1, VCFHeaderLineType.Float, "Z-score from Wilcoxon rank sum test of Alt Vs. Ref haplotype likelihoods"));
addInfoLine(new VCFInfoHeaderLine(MAP_QUAL_RANK_SUM_KEY, 1, VCFHeaderLineType.Float, "Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities"));
addInfoLine(new VCFInfoHeaderLine(AS_MAP_QUAL_RANK_SUM_KEY, VCFHeaderLineCount.A, VCFHeaderLineType.Float, "allele specific Z-score From Wilcoxon rank sum test of each Alt vs. Ref read mapping qualities"));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,6 @@
import java.security.NoSuchAlgorithmException;
import java.util.*;
import java.util.function.BiConsumer;
import java.util.function.Consumer;
import java.util.function.IntUnaryOperator;
import java.util.stream.Collectors;
import java.util.stream.Stream;

Expand Down Expand Up @@ -930,7 +928,7 @@ public void testRawGtCountAnnotation() {
args.addReference(b37_reference_20_21)
.addVCF(reblockedGVCF)
.addOutput(output)
.add("keep-combined-raw-annotations", true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this replaces the previous test in favor of this new one. Perhaps we should consider just adding a new test, so that we cover both 1) keeping all combined annotations, and 2) keeping only the specified raw annotations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've now made these two arguments (keep all combined and keeping only specified) mutually exclusive (not at the time of your review, but since then based on other comments). So I'm going to leave this test as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just to be clear, you mean you'll revert this test to its original behavior, right? In that case, will you add another test for the keep-specified behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original behavior of --keep-combined-raw-annotations set to true while --keep-specified-raw-annotations is not empty is no longer allowed due to the mutex. So I think I am not reverting this test to the original behavior. Does that make sense? This test sets --keep-specified-raw-annotations to RawGtCount

Copy link
Contributor

@samuelklee samuelklee Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, by "original behavior" I mean the current behavior of the test as it is in master. This test sets --keep-combined-raw-annotations true and hence additionally retains RAW_MQandDP.

In contrast, you modify the test in your branch so that only RAW_GT_COUNT is retained and we check that RAW_MQandDP is not. This means the code path that is responsible for retaining RAW_MQandDP is no longer run in this test, at least (and is perhaps not run for this particular input test file in any test at all). Even though the original test does not explicitly check for any particular RAW_MQandDP output, this is still an effective drop in test coverage.

Probably not anything to sweat about, but that's all that I wanted to point out in my original comment. I'll leave it up to you to decide if --keep-combined-raw-annotations true is sufficiently covered by the other tests so that we don't have to worry about replacing the original test here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Sorry, I see what you're saying now. I think I originally added this test as a way of testing RawGTCount, rather than --keep-combined-raw-annotations true, so I overlooked that it did actually test the latter too. I do think the other tests sufficiently cover --keep-combined-raw-annotations true but I very much see how that is not clear from this change.

.add(GenotypeGVCFsAnnotationArgumentCollection.KEEP_SPECIFIED_RAW_COMBINED_ANNOTATION_LONG_NAME, "RawGtCount")
.add("A", "RawGtCount");
runCommandLine(args);

Expand All @@ -942,6 +940,7 @@ public void testRawGtCountAnnotation() {
Assert.assertEquals(rawGtCount.get(0), ".");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on with the hom-ref count? I see the TODO in the RawGtCount class but I don't understand the fundamental limitation.

Incidentally, perhaps we can go ahead and expand the header-line documentation of this annotation to explicitly give the hom-ref/het/hom-var order. Maybe this is obvious to everyone else, but why not document it for dummies like me?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that RawGtCount is combined by GenomicsDB as a sum. There is no RawGtCount annotation on RefBlocks so the HomRef count always ends up being 0 in this case. It seems like ExcessHet isn't actually using RawGtCount as I expected. It recalculates the counts in GenotypeUtils.computeDiploidGenotypeCounts which doesn't seem to use RawGtCount so I'm not sure why RawGtCount is the rawKey associated with ExcessHet. Seems like this whole thing could be cleaned up so that we use the same method ExcessHet does to calculate the count of Het and HomVar genotypes at a site, but I'm not sure why we didn't do that in the first place.

Also I added the documentation to the header line.

Assert.assertEquals(rawGtCount.get(1), "2");
Assert.assertEquals(rawGtCount.get(2), "0");
Assert.assertFalse(vc.getAttributes().containsKey(GATKVCFConstants.RAW_MAPPING_QUALITY_WITH_DEPTH_KEY));
}

}
Expand Down
Loading