Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter Failing QC Sites from Extract #7201

Merged
merged 5 commits into from
Apr 14, 2021
Merged

Filter Failing QC Sites from Extract #7201

merged 5 commits into from
Apr 14, 2021

Conversation

kcibul
Copy link
Contributor

@kcibul kcibul commented Apr 13, 2021

Support for filtering sites based on filter_set_sites information

Cleaned up some naming as well, wired into WDL (along with tranches parameter)

@kcibul kcibul changed the title Filter based on Filter Failing QC Sites from Extract Apr 13, 2021

String query_project = data_project

String fq_sample_table = "~{data_project}.~{default_dataset}.sample_info"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens here if default_dataset is not defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I struggled with this a little bit to have a principle of least surprise...

If you don't supply a default_dataset AND you don't override these parameters... things won't work (ie the dataset reference won't be valid). Not much we can do there since there isn't great parameter validation in WDL with xor or groups of settings, etc.

We could make "default_dataset" required. That might make more sense since the vast majority of times... you mean to set that and having the validation would be nice. And in the rare case where someone wants to override all the table parameters... they'll still have to specify the default even though technically it's not needed. But that's the exceptional case... wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the slack convo -- I think default_dataset should be non-optional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight preference for making default_dataset required. Like you said, if it's going to be used the vast majority of the time, having it in the "required" section of inputs is going to lead to better outcomes than if it's a random optional input that really should most of the time be defined.

In the rarer cases where someone provides the fq table inputs, default_dataset will just get ignored, right?

Copy link
Member

@mmorgantaylor mmorgantaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 👍

fullYngMap.putIfAbsent(location, new HashMap<>());
fullYngMap.get(location).putIfAbsent(ref, new HashMap<>());
fullYngMap.get(location).get(ref).put(alt, yng);
try (StorageAPIAvroReader reader = new StorageAPIAvroReader(filterSetInfoTableRef, rowRestrictionWithFilterSetName, projectID)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the try here with no except ? does it take care of closing the reader when you're done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! It's like a with in python

https://www.baeldung.com/java-try-with-resources

@@ -505,8 +557,7 @@ private VariantContext filterVariants(VariantContext mergedVC, HashMap<Allele, H
builder.filter(GATKVCFConstants.VQSR_FAILURE_INDEL);
}
} else {
// If VQSR dropped this site (there's no YNG or VQSLOD) then we'll filter it as a NAY.
builder.filter(GATKVCFConstants.NAY_FROM_YNG);
// per-conversation with Laura, if there is no information we let the site pass (ie no data does not imply failure)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@@ -48,4 +49,7 @@ public ExtractCohortFilterRecord(GenericRecord genericRecord) {

public String getAltAllele() { return this.altAllele; }

public String toString() {
return ReflectionToStringBuilder.toString(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤩

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha..I was sick of the ExtractCohortFilterRecord@12318f coming out in the logger messages :D

@@ -52,7 +52,7 @@ public StorageAPIAvroReader(final TableReference tableRef, String parentProjectI
public StorageAPIAvroReader(final TableReference tableRef, final String rowRestriction, String parentProjectId) {

try {
logger.info("Using Storage API from " + tableRef + " with '" + rowRestriction + "'");
logger.info("Using Storage API from " + tableRef.getFQTableName() + " with '" + rowRestriction + "'");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for fixing this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹

String data_project
String? default_dataset

String query_project = data_project
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this make query_project optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the sense that "you don't have to put it in the WDL" -- yes.

I have no idea what the difference is between

String? query_project = "foo"

and

String query_project = "foo"

In either case... if you supply the workflow input it overrides the "foo" and if you don't supply the workflow value, it doesn't complain (ie the "foo" is used).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, and per Slack thread, we want it this way (and not String?) so that the user can't override query_project to null. 👍

@kcibul kcibul merged commit a3f5039 into ah_var_store Apr 14, 2021
@kcibul kcibul deleted the kc_site_extract branch April 14, 2021 15:52
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants