Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curate input arrays to skip already ingested sample data [VS-246] #7862

Merged
merged 21 commits into from
Jun 3, 2022

Conversation

rsasch
Copy link

@rsasch rsasch commented May 19, 2022

@codecov
Copy link

codecov bot commented May 19, 2022

Codecov Report

❗ No coverage uploaded for pull request base (ah_var_store@4d30135). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff                @@
##             ah_var_store     #7862   +/-   ##
================================================
  Coverage                ?   86.290%           
  Complexity              ?     35190           
================================================
  Files                   ?      2170           
  Lines                   ?    164888           
  Branches                ?     17786           
================================================
  Hits                    ?    142282           
  Misses                  ?     16281           
  Partials                ?      6325           

@rsasch rsasch requested a review from mcovarr May 20, 2022 15:23
@mcovarr
Copy link
Collaborator

mcovarr commented May 23, 2022

Especially since curate_input_array_files.py is already broken out into its own file, can there be some tests on this?

Copy link
Collaborator

@gbggrant gbggrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - two minor suggestions.

bq --project_id=~{project_id} query --format=csv --use_legacy_sql=false -n ~{num_samples} \
"SELECT sample_id, samples.sample_name FROM \`~{dataset_name}.~{table_name}\` AS samples JOIN \`${TEMP_TABLE}\` AS temp ON samples.sample_name=temp.sample_name" > sample_map
"SELECT sample_id, samples.sample_name FROM \`~{dataset_name}.~{table_name}\` AS samples JOIN \`${TEMP_TABLE}\` AS temp ON samples.sample_name=temp.sample_name WHERE samples.sample_id NOT IN (SELECT sample_id FROM \`~{dataset_name}.sample_load_status\` WHERE status='FINISHED')" > sample_map

cut -d, -f1 sample_map > gvs_ids
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:
Might be clearer as:

Suggested change
cut -d, -f1 sample_map > gvs_ids
cut -d ',' -f1 sample_map > gvs_ids

scripts/variantstore/wdl/GvsImportGenomes.wdl Show resolved Hide resolved
@rsasch rsasch requested a review from gbggrant June 2, 2022 21:46
Copy link
Collaborator

@mcovarr mcovarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Collaborator

@gbggrant gbggrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. 2 minor questions.

scripts/variantstore/wdl/GvsImportGenomes.wdl Outdated Show resolved Hide resolved
scripts/variantstore/wdl/GvsImportGenomes.wdl Outdated Show resolved Hide resolved
@rsasch rsasch merged commit 00e7d57 into ah_var_store Jun 3, 2022
@rsasch rsasch deleted the rsa_skip_samples branch June 3, 2022 18:51
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants