Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addressing OOM in CohortExtract #7245

Merged
merged 6 commits into from
May 11, 2021
Merged

Addressing OOM in CohortExtract #7245

merged 6 commits into from
May 11, 2021

Conversation

kcibul
Copy link
Contributor

@kcibul kcibul commented May 8, 2021

Addresses https://github.com/broadinstitute/dsp-spec-ops/issues/307

  • Increase headroom on VM above Java
  • Increase disk space (incidental, not related to OOM)
  • parameterized Gnarly usage, default to false
  • --emit-pls set to false no longer pulls down PLs

Compared results against baseline and saw no changes in GIAB results using ACMG cohort

@kcibul kcibul changed the title Extra memory headroom for GvsExtractCohort Addressing OOM in CohortExtract May 11, 2021
Copy link
Contributor

@ahaessly ahaessly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few suggestions

@@ -78,6 +78,7 @@
public static final List<String> FILTER_SET_SITE_FIELDS = Arrays.asList(FILTER_SET_NAME,LOCATION_FIELD_NAME,FILTERS);

public static final List<String> COHORT_FIELDS = Arrays.asList(LOCATION_FIELD_NAME, SAMPLE_NAME_FIELD_NAME, STATE_FIELD_NAME, REF_ALLELE_FIELD_NAME, ALT_ALLELE_FIELD_NAME, CALL_GT, CALL_GQ, CALL_RGQ, QUALapprox, AS_QUALapprox, CALL_PL);//, AS_VarDP);
public static final List<String> COHORT_FIELDS_NO_PL = Arrays.asList(LOCATION_FIELD_NAME, SAMPLE_NAME_FIELD_NAME, STATE_FIELD_NAME, REF_ALLELE_FIELD_NAME, ALT_ALLELE_FIELD_NAME, CALL_GT, CALL_GQ, CALL_RGQ, QUALapprox, AS_QUALapprox);//, AS_VarDP);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably get rid of the //, AS_VarDP if we are not using it (I think I had added that in there).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -119,7 +124,8 @@ public ExtractCohortEngine(final String projectID,
this.filterSetName = filterSetName;

this.variantContextMerger = new ReferenceConfidenceVariantContextMerger(annotationEngine, vcfHeader);
this.gnarlyGenotyper = new GnarlyGenotyperEngine(false, 30, false, emitPLs, true);
this.disableGnarlyGenotyper = disableGnarlyGenotyper;
this.gnarlyGenotyper = disableGnarlyGenotyper?null:new GnarlyGenotyperEngine(false, 30, false, emitPLs, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - add spaces in ?:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

final VariantContext mergedVC = variantContextMerger.merge(
unmergedCalls,
new SimpleInterval(contig, (int) start, (int) start),
refAllele.getBases()[0],
disableGnarlyGenotyper?true:false,
this.disableGnarlyGenotyper?true:false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this syntax is a little confusing here. we could either just pass this.disableGnarlyGenotyper or if we wanted to be more explicit about what we are doing and why - we could declare a variable like:
boolean removeNonRefSymbolicAllele = this.disableGnarlyGenotyper;
and then pass the new parameter in the method call.
also we should update or remove the comment above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree -- good feedback

@@ -156,8 +156,8 @@ task ExtractTask {
# Runtime settings:
runtime {
docker: "us.gcr.io/broad-dsde-methods/broad-gatk-snapshots:varstore_d8a72b825eab2d979c8877448c0ca948fd9b34c7_change_to_hwe"
memory: "10 GB"
disks: "local-disk 100 HDD"
memory: "12 GB"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just a nice-to-have) does it make sense to document our rationale for these choices somewhere? even a distinction between "this value was chosen deliberately" and "this is an inherited value and we haven't thought about this much" could be useful in future?

Copy link
Member

@mmorgantaylor mmorgantaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@kcibul kcibul merged commit 30afcaa into ah_var_store May 11, 2021
@kcibul kcibul deleted the kc_mem branch May 11, 2021 17:52
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants