Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the subpopulation calculations to the VAT creation WDL #7399

Merged
merged 28 commits into from
Aug 24, 2021

Conversation

RoriCremer
Copy link
Contributor

@RoriCremer RoriCremer commented Aug 10, 2021

This pr adds the subpopulation AC/AN/AF calculations.
It does this by taking in the ancestry table and making sublists of each---then passing that list of samples into the SelectVariants GATK tool

Updated Lucid chart here: https://lucid.app/lucidchart/fee376a4-4b72-481e-a239-a027f7f6ab1f/edit?page=CsG3hy3S1zEH#

Design Doc for this work:
https://docs.google.com/document/d/1FnPu_Jkz2O9rElApAQld0v6iBEFGe22dKarVWcwNxGI/edit

misc:
how should I add the VAT validation to the VAT pipeline? Should it run automatically?

Anvil data version of this table: spec-ops-aou:anvil_100_for_testing.vat_aug19

Screen Shot 2021-08-11 at 5 38 22 PM

@RoriCremer RoriCremer changed the base branch from master to ah_var_store August 10, 2021 03:41
@RoriCremer RoriCremer marked this pull request as ready for review August 12, 2021 14:00
}

################################################################################

task MakeSubpopulationFiles { ## TODO why am I seemingly able to get the ancestry file in Terra? is it cached? I should need the SA for it---so actually, but me worth trying the SA use sooner
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still an issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an issue--but I would feel more comfortable with the data in an AoU bucket just for completeness I suppose? I'm going to do a follow on pr where I move the ancestry file into an AoU protected bucket and add a SA json to this step

if [[ $NUMRESULTS != "0" ]]; then
echo "PASS: The VAT table ~{fq_vat_table} has a correct calculation for subpopulation" > validation_results.txt
else
echo "FAIL: The VAT table ~{fq_vat_table} has an incorrect calculation for subpopulation" > validation_results.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add the vids that are returned from the query to make it easier to debug the issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! I think they are already returned in the validation_results.txt

FROM
~{fq_vat_table}
WHERE
gvs_all_ac != 20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this before we substitute 20 for values less than 20? If so then we don't really need this check. And if not then there could still be a subpop that is lower then 20 even if the all_ac isn't so I'm not sure this check is really doing the correct thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will all get ripped out during de-obfuscation pr -- it was really just so that there wouldn't be validation run at all on the rows where the AC had been obfuscated to 20 (because the math wouldn't check out there)

if [[ $NUMRESULTS != "0" ]]; then
echo "PASS: The VAT table ~{fq_vat_table} has a correct calculation for AC and the AC of subpopulations" > validation_results.txt
else
echo "FAIL: The VAT table ~{fq_vat_table} has an incorrect calculation for AC and the AC of subpopulations" > validation_results.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again maybe log the vids for easier debugging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes---the vids are logged in the output file

@@ -17,9 +17,37 @@ workflow GvsSitesOnlyVCF {
String? service_account_json_path
File? gatk_override
File AnAcAf_annotations_template
File ancestry_file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this wdl seems relatively generic, but sense an ancestry_file is required it seem sub pop specific. Maybe we should rename this and move to the vat dir?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan was to rename the wdl when it gets split into the sites only and the vat creation wdl

it might make sense to do that as a ticket we tackle sooner rather than later

@@ -403,6 +585,31 @@ task BigQueryLoadJson {
v.gvs_all_ac,
v.gvs_all_an,
v.gvs_all_af,
v.gvs_max_af,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need this line at the start of this task:
echo "project_id = ~{project_id}" > ~/.bigqueryrc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do I not need to add the project_id to all of my queries then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no you shouldn't have to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that line just prevents an interactive error message that we sometimes see

@@ -0,0 +1,3 @@
{
"ToyLoopOptional.string": "HALP"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean to check this in and the ToyLoop wdl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omg nope

print(seen_subpopulations)
else:
print("WARNING: This list had an unexpected subpopulation", row_subpopulation)
## TODO do I need to close these?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you do need to close them.
If you use a with clause then you don't, but I don't think that will work within this looping structure

@RoriCremer RoriCremer merged commit 79f23a5 into ah_var_store Aug 24, 2021
@RoriCremer RoriCremer deleted the rc-vs-164-subpop branch August 24, 2021 17:16
ahaessly pushed a commit that referenced this pull request Aug 27, 2021
* add subpop list

* add hard coded steps for one set of subpop calculations

* add sample list

* update the WDL to have the subworkflows

* parse the subpop ac/an/af in python

* update schemas with the new subpop calculations

* fix initial state bug for max calc

* add gvs subpops

* remove saved annotations used for debugging

* short term testing in Terra

* add ancestry file parsing

* add validation step for subpops

* adding service account to grab ancestry file

* turn off localization optional for auth issues

* add service account to subpop calcs

* cleaner linux parsing tabs

not all samples

* add ancestry file

* save annotation file for debugging

* update subpop python script

* add validation for the subpop AC

* debugging req SA json insanity

* cleaner to use expected file type

* this version works--need to unhardcode a bit

* #&^%@($&^ WDL!!!

* pr review edits

* close python file handles

* remove from dockstore
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants