Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix the check for duplicates in import genomes #7470

Merged
merged 6 commits into from
Sep 17, 2021
Merged

Conversation

ahaessly
Copy link
Contributor

No description provided.

Copy link
Contributor

@kcibul kcibul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What didn't work in the previous implementation (just to review for that case specifically?)

scripts/variantstore/wdl/GvsImportGenomes.wdl Show resolved Hide resolved
NAMES_FILE=~{write_lines(sample_names)}
bq load --project_id=~{project_id} ${TEMP_TABLE} $NAMES_FILE "sample_name:STRING"

bq --location=US --project_id=~{project_id} query --format=csv -n ~{num_samples} --use_legacy_sql=false \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment here would be good saying how this works? My take (to see if I got it right) is "Check to see if data has been loaded for any of the provided sample names"?

@gatk-bot
Copy link

gatk-bot commented Sep 14, 2021

Travis reported job failures from build 36030
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 36030.13 logs
unit openjdk8 36030.3 logs

@ahaessly
Copy link
Contributor Author

The main issue with this task was that the query results were being limited to 100 by default. So we use the -n param now in the query. Another issue was that we were running bq show on a table variable $TABLE which is never defined.
I also changed this because the approach (returning all the samples names of the samples that have been loaded) didn't seem scalable. I wanted to only return at most the number of samples we are trying to ingest.

@ahaessly ahaessly merged commit 185b5f4 into ah_var_store Sep 17, 2021
@ahaessly ahaessly deleted the ah_fix_dupes branch September 17, 2021 20:22
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants