Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for incremental addition of data to alt_allele [VS-52] #7993

Merged
merged 19 commits into from
Aug 22, 2022

Conversation

rsasch
Copy link

@rsasch rsasch commented Aug 17, 2022

  • changed CREATE OR REPLACE TABLE to CREATE TABLE IF NOT EXISTS for alt_allele
  • pass around max sample id already in alt_allele to figure out which vet_ table to start with and what sample IDs data to grab
  • tied out by comparing gvs-internal.rsa_gvs_quickstart_dev.alt_allele (done with existing version) vs. gvs-internal.rsa_gvs_quickstart_dev.alt_allele_incremental (done in two batches with code from this branch)
  • renamed WDL to GvsPopulateAltAllele

Closed https://broadworkbench.atlassian.net/browse/VS-52

@codecov
Copy link

codecov bot commented Aug 17, 2022

Codecov Report

❗ No coverage uploaded for pull request base (ah_var_store@0130bb8). Click here to learn what that means.
The diff coverage is n/a.

Additional details and impacted files
@@               Coverage Diff                @@
##             ah_var_store     #7993   +/-   ##
================================================
  Coverage                ?   77.030%           
  Complexity              ?     21708           
================================================
  Files                   ?      1375           
  Lines                   ?     82251           
  Branches                ?     13121           
================================================
  Hits                    ?     63358           
  Misses                  ?     13764           
  Partials                ?      5129           

Copy link
Collaborator

@mcovarr mcovarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some draft nitpicks, looks promising!

scripts/variantstore/wdl/GvsCreateAltAllele.wdl Outdated Show resolved Hide resolved
scripts/variantstore/wdl/GvsCreateAltAllele.wdl Outdated Show resolved Hide resolved
scripts/variantstore/wdl/GvsCreateAltAllele.wdl Outdated Show resolved Hide resolved
scripts/variantstore/wdl/GvsCreateAltAllele.wdl Outdated Show resolved Hide resolved
scripts/variantstore/wdl/GvsCreateAltAllele.wdl Outdated Show resolved Hide resolved
scripts/variantstore/wdl/GvsCreateAltAllele.wdl Outdated Show resolved Hide resolved
@rsasch rsasch marked this pull request as ready for review August 18, 2022 21:52
Copy link
Collaborator

@gbggrant gbggrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

scripts/variantstore/wdl/GvsPopulateAltAllele.wdl Outdated Show resolved Hide resolved
@@ -39,6 +39,7 @@ def populate_alt_allele_table(call_set_identifier, query_project, vet_table_name
parser.add_argument('--query_project',type=str, help='Google project where query should be executed', required=True)
parser.add_argument('--vet_table_name',type=str, help='vet table name to ingest', required=True)
parser.add_argument('--fq_dataset',type=str, help='project and dataset for data', required=True)
parser.add_argument('--max_sample_id',type=str, help='Maximum value of sample_id already loaded', required=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annoying nit:

Suggested change
parser.add_argument('--max_sample_id',type=str, help='Maximum value of sample_id already loaded', required=True)
parser.add_argument('--max_sample_id', type=str, help='Maximum value of sample_id already loaded', required=True)

@@ -103,7 +157,7 @@ task CreateAltAlleleTable {

echo "project_id = ~{project_id}" > ~/.bigqueryrc
bq query --location=US --project_id=~{project_id} --format=csv --use_legacy_sql=false ~{bq_labels} \
'CREATE OR REPLACE TABLE `~{project_id}.~{dataset_name}.alt_allele` (
'CREATE TABLE IF NOT EXISTS `~{project_id}.~{dataset_name}.alt_allele` (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, I wasn't aware of this command… nice!

position1 as (select * from `{fq_vet_table}` WHERE call_GT IN ('0/1', '1/0', '1/1', '0|1', '1|0', '1|1', '0/2', '0|2','2/0', '2|0')),
position2 as (select * from `{fq_vet_table}` WHERE call_GT IN ('1/2', '1|2', '2/1', '2|1'))"""
WITH
position1 as (select * from `{fq_vet_table}` WHERE call_GT IN ('0/1', '1/0', '1/1', '0|1', '1|0', '1|1', '0/2', '0|2','2/0', '2|0') AND sample_id > {max_sample_id}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smart, I like the use of > rather than some IN kind of clause

@rsasch rsasch merged commit 187fe60 into ah_var_store Aug 22, 2022
@rsasch rsasch deleted the rsa_vs_52_incremental_alt_allele branch August 22, 2022 19:32
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants