Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VS-390. Add precision and sensitivity wdl #7813

Merged
merged 8 commits into from
Apr 29, 2022

Conversation

gbggrant
Copy link
Collaborator

@gbggrant gbggrant commented Apr 26, 2022

Converted tie-out procedure to calculate Precision and Sensitivity to a wdl.

An example run (on chr20) is at: https://job-manager.dsde-prod.broadinstitute.org/jobs/03346ba0-94f8-4205-b72e-499d73de9d43

An example run (using all input VCFs and all chromosomes) is at: https://job-manager.dsde-prod.broadinstitute.org/jobs/a041f918-e72f-4a98-aa98-7f242bab0b03

Make chromosome optional
Added the logic to select only input_vcfs on the specified chromosome.
@codecov
Copy link

codecov bot commented Apr 26, 2022

Codecov Report

❗ No coverage uploaded for pull request base (ah_var_store@2381a09). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff                @@
##             ah_var_store     #7813   +/-   ##
================================================
  Coverage                ?   86.295%           
  Complexity              ?     35192           
================================================
  Files                   ?      2170           
  Lines                   ?    164837           
  Branches                ?     17775           
================================================
  Hits                    ?    142246           
  Misses                  ?     16265           
  Partials                ?      6326           

File input_vcf
String output_basename

String docker = "us.gcr.io/broad-gotc-prod/imputation-bcf-vcf:1.0.5-1.10.2-0.1.16-1649948623"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this container may be better? but fyi bcftools is in the ah_var_store container (and we should probably add tabix if we haven't already)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize ah_var_store had bgzip (and tabix?), so I went with this one. It's a lot smaller (394 MB vs 4.74 GB for ah_var_store), so the task will run faster. For simple tasks like this it probably makes some sense to use the smaller, but on the other hand standardizing on one docker might make sense too.

scripts/variantstore/tieout/AoU_PRECISION_SENSITIVITY.md Outdated Show resolved Hide resolved
scripts/variantstore/tieout/AoU_PRECISION_SENSITIVITY.md Outdated Show resolved Hide resolved
```
Now create single sample gVCFs for the control samples; in this example the sample names for the controls are "BI_HG-002", "UW_HG-002" and "BI_HG-003":
**truth_vcfs** - A list of the VCFs that contain the truth data used for analyzin the samples in `sample_names`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**truth_vcfs** - A list of the VCFs that contain the truth data used for analyzin the samples in `sample_names`.
**truth_vcfs** - A list of the VCFs that contain the truth data used for analyzing the samples in `sample_names`.

```
BASE_CMD="rtg vcfeval --region chr20 --roc-subset snp,indel --vcf-score-field=INFO.MAX_AS_VQSLOD -t human_REF_SDF"
SUFFIX="_roc_filtered"
**truth_beds** - A list of the bed files for the truth data used for analyzin the samples in `sample_names`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gone analyzin'

Suggested change
**truth_beds** - A list of the bed files for the truth data used for analyzin the samples in `sample_names`.
**truth_beds** - A list of the bed files for the truth data used for analyzing the samples in `sample_names`.


if (false) {
String? none = "None"
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand this construct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the trick for how you make an undefined value / 'None' in wdl.
I have heard of no other way to set a variable to undefined.

}

String? contig = if (chromosome == "all") then none else chromosome

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit---I found this bit (lines 18-22) a lil confusing without a comment. And why are we passing the contig to SelectVariants if we already split by contig??? hmm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the comment. 'none' is defined to be an undefined value and this was the trick I used to allow the workflow take a defined value 'all' for chromosome and translate that into undefined which was passed to SelectVariants.

@gbggrant gbggrant requested a review from mcovarr April 27, 2022 16:59
@gbggrant gbggrant merged commit d51a4e5 into ah_var_store Apr 29, 2022
@gbggrant gbggrant deleted the gg_AddPrecisionAndSensitivityWdl branch April 29, 2022 10:51
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants