Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filtering to ExtractCohort #6971

Merged
merged 2 commits into from
Nov 20, 2020
Merged

Add filtering to ExtractCohort #6971

merged 2 commits into from
Nov 20, 2020

Conversation

meganshand
Copy link
Contributor

I think this is ready for initial review while I start testing it on larger scale data. I've left adding in the full VQSR tranche filtering as TODOs for now.

@gatk-bot
Copy link

Travis reported job failures from build 32188
Failures in the following jobs:

Test Type JDK Job ID Logs
conda openjdk8 32188.5 logs

Copy link
Contributor

@kcibul kcibul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! A few minor comments

@@ -46,6 +55,17 @@ protected void onStartup() {

VCFHeader header = CommonCode.generateVcfHeader(sampleNames, reference.getSequenceDictionary());

if (minLocation == null && maxLocation == null && hasUserSuppliedIntervals()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!

fullName = "filter-set-name",
doc = "Name in filter_set_name column of filtering table to use. Which training set should be applied in extract."
)
private String nameOfFilterSet = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps filterSetName instead

@@ -85,7 +86,8 @@ public ExtractCohortEngine(final String projectID,
final double vqsLodSNPThreshold,
final double vqsLodINDELThreshold,
final ProgressMeter progressMeter,
final ExtractCohort.QueryMode queryMode) {
final ExtractCohort.QueryMode queryMode,
final String nameOfFilterSet) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps filterSetName instead

final String rowRestrictionWithFilterSetName = rowRestriction + " AND " + SchemaUtils.FILTER_SET_NAME + " = '" + nameOfFilterSet + "'";

final StorageAPIAvroReader filteringTableAvroReader = new StorageAPIAvroReader(filteringTableRef, rowRestrictionWithFilterSetName, projectID);
final HashMap<Locatable, HashMap<Allele, HashMap<Allele, Double>>> fullVqsLodMap = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to have a comment about what each of these alleles are? is the first allele reference and the second is the alternate?

final String yng = queryRow.get("yng_status").toString();
final Allele ref = Allele.create(queryRow.get("ref").toString(), true);
final Allele alt = Allele.create(queryRow.get("alt").toString(), false);
final Locatable locatable = new SimpleInterval(contig, currentPosition, currentPosition);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of locatable, could we just leave it as location since we'll have that when we try to lookup in this table? (ie coming out of the extract BQ?)

}
final VariantContext filteredVC = builder.make();
return filteredVC;
}

private <T> LinkedHashMap<Allele, T> remapAllelesInMap(VariantContext vc, HashMap<Allele, HashMap<Allele, T>> datamap, T emptyVal) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment for what this does would be great, since even reading it... I'm not sure!

@gatk-bot
Copy link

gatk-bot commented Nov 19, 2020

Travis reported job failures from build 32210
Failures in the following jobs:

Test Type JDK Job ID Logs
conda openjdk8 32210.5 logs
conda openjdk8 32210.5 logs

@meganshand meganshand merged commit 2610b20 into ah_var_store Nov 20, 2020
@meganshand meganshand deleted the ms_extract branch November 20, 2020 19:59
kcibul pushed a commit that referenced this pull request Jan 29, 2021
* adding filtering to extractCohort

* addressing comments
kcibul pushed a commit that referenced this pull request Feb 1, 2021
* adding filtering to extractCohort

* addressing comments
Marianie-Simeon pushed a commit that referenced this pull request Feb 16, 2021
* adding filtering to extractCohort

* addressing comments
kcibul pushed a commit that referenced this pull request Mar 9, 2021
* adding filtering to extractCohort

* addressing comments
mmorgantaylor pushed a commit that referenced this pull request Apr 6, 2021
* adding filtering to extractCohort

* addressing comments
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants