Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving the WDL for importing array manifest to BQ #6860

Merged
merged 4 commits into from
Oct 5, 2020

Conversation

meganshand
Copy link
Contributor

I moved the WDL for importing the array manifest from the variantstore repo and added a test. The test here only checks that the WDL succeeded, it doesn't look a the results (yet). It's ingesting the manifest to a dataset with a 7 day TTL, so the tables eventually get cleaned up. That might be too long for this case, since it adds a table each time the test is run (so on push and PR).

I plan to add more of the "end-to-end" pipeline with more testing in the future using a similar scheme, so welcome feedback on the structure.

Copy link
Contributor

@ahaessly ahaessly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one change needed

# checking for != "build37Flag" skips the header row (we don't want that numbered)
# only process rows with 29 fields - this skips some header info fields
# also skip entries that are flagged, not matched or have index conflict
awk -F ',' 'NF==29 && ($29!="ILLUMINA_FLAGGED" && $29!="INDEL_NOT_MATCHED" && $29!="INDEL_CONFLICT" && $29!="build37Flag") { flag=$29; if ($29=="PASS") flag="null"; print id++","$2","$9","$23","$24","$25","$26","$27","flag }' $TMP_SORTED > $TMP_PROC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id++ should be changed to ++id
this was in a recent PR of mine, but the rest looks correct.
Eventually we should add a test that verifies 1. there is no probe 0 and 2. that 2 different runs to this code yield the same probe->id map (these were both bugs in the past. and don't feel like you have to do that for this PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely considering it a TODO to add tests that actually make sure the data that was imported is correct. I haven't figured out the best way to do this though. Maybe I need a GATK tool that pulls down a table and compares to an expected file? Or maybe by looking at the extracted VCF from the full "end-to-end" test we'll have enough coverage? I'm not sure if that part of the pipeline pulls down information in the manifest.

{
"ImportArrayManifest.extended_manifest_csv":"/home/travis/build/broadinstitute/gatk/src/test/resources/org/broadinstitute/hellbender/tools/variantdb/arrays/tiny_manifest.csv",
"ImportArrayManifest.manifest_schema_json":"/home/travis/build/broadinstitute/gatk/scripts/variantstore_wdl/schemas/manifest_schema.json",
"ImportArrayManifest.project_id":"broad-dsde-dev",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just copied from elsewhere? Seems like there should be a gatk-test google project...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the GATK test project AFAIK. It's used in the BigQueryUtils tests (in GATK).

String? docker
}

String docker_final = select_first([docker, "us.gcr.io/broad-gatk/gatk:4.1.7.0"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you overriding this? Or are we always testing with this static GATK version?

Copy link
Contributor Author

@meganshand meganshand Oct 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not overriding it in this case because we're not using any GATK tools in this WDL. But maybe we should still be testing the current branch regardless? I know we'll need to for future WDLs (Ingest, calculate metrics, and extract will all use GATK tools that will need to be in the docker from the current branch)

@meganshand meganshand merged commit 517caeb into ah_var_store Oct 5, 2020
@meganshand meganshand deleted the ms_manifest branch October 5, 2020 18:33
kcibul pushed a commit that referenced this pull request Jan 29, 2021
* Copying wdl from variantstore repo

* Adding tests and changes to WDL

* addressing comments

* adding readme
kcibul pushed a commit that referenced this pull request Feb 1, 2021
* Copying wdl from variantstore repo

* Adding tests and changes to WDL

* addressing comments

* adding readme
Marianie-Simeon pushed a commit that referenced this pull request Feb 16, 2021
* Copying wdl from variantstore repo

* Adding tests and changes to WDL

* addressing comments

* adding readme
kcibul pushed a commit that referenced this pull request Mar 9, 2021
* Copying wdl from variantstore repo

* Adding tests and changes to WDL

* addressing comments

* adding readme
mmorgantaylor pushed a commit that referenced this pull request Apr 6, 2021
* Copying wdl from variantstore repo

* Adding tests and changes to WDL

* addressing comments

* adding readme
mmorgantaylor pushed a commit that referenced this pull request Apr 6, 2021
* Copying wdl from variantstore repo

* Adding tests and changes to WDL

* addressing comments

* adding readme
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants