-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix AoU workflow bugs #7874
Fix AoU workflow bugs #7874
Conversation
@@ -8,6 +8,8 @@ workflow GvsExtractCallset { | |||
String dataset_name | |||
String project_id | |||
|
|||
String cohort_project_id = project_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PrepRanges initializes the cohort tables into destination_*
; then this workflow assumed the cohort tables are in the GVS dataset.
@@ -182,7 +185,7 @@ task ValidateFilterSetName { | |||
|
|||
echo "project_id = ~{query_project}" > ~/.bigqueryrc | |||
|
|||
OUTPUT=$(bq --location=US --project_id=~{query_project} --format=csv query --use_legacy_sql=false "SELECT filter_set_name as available_filter_set_names FROM ~{data_project}.~{data_dataset}.filter_set_info GROUP BY filter_set_name") | |||
OUTPUT=$(bq --location=US --project_id=~{query_project} --format=csv query --use_legacy_sql=false "SELECT filter_set_name as available_filter_set_names FROM \`~{data_project}.~{data_dataset}.filter_set_info\` GROUP BY filter_set_name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you able to run this version successfully? This doesn't execute without this change for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That line worked for us with Charlie, but I wonder if there is something going on with the default project id? It feels odd that you couldn't run this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, here is the error I got:
sdk@sha256:ff4546d0bab6048b4ae61ddfb1dfdccb12b3725f5833fb696a5c5915e5bcdd15 /cromwell_root/script
+ '[' false = true ']'
+ echo 'project_id = terra-vpc-sc-dev-d59ab2f1'
++ bq --location=US --project_id=terra-vpc-sc-dev-d59ab2f1 --format=csv query --use_legacy_sql=false 'SELECT filter_set_name as available_filter_set_names FROM fc-aou-cdr-synth-test-2.1kg_wgs_2022q1.filter_set_info GROUP BY filter_set_name'
+ OUTPUT='Error in query string: Error processing job '\''terra-vpc-sc-
dev-d59ab2f1:bqjob_r6cfbf771954bf621_00000181043dd904_1'\'': Syntax error: Missing
whitespace between literal and alias at [1:84]'
2022/05/27 06:39:34 Starting delocalization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect the dataset name was the issue since it cannot be an unquoted identifier with a leading 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
@@ -172,7 +175,7 @@ task ValidateFilterSetName { | |||
String has_service_account_file = if (defined(service_account_json_path)) then 'true' else 'false' | |||
|
|||
command <<< | |||
set -e | |||
set -ex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was failing with no error output. -x shows the bash that was run and allowed me to diagnose the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably standardize on setting some of these options in our command blocks by default (errexit
, xtrace
, pipefail
, possibly others).
@@ -229,7 +232,6 @@ task ExtractTask { | |||
Boolean emit_ads | |||
|
|||
Boolean do_not_filter_override | |||
String fq_ranges_dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused
@@ -17,9 +17,9 @@ workflow GvsExtractCohortFromSampleNames { | |||
# not using the defaults in GvsPrepareCallset because we're using pre created datasets defined by the caller | |||
String destination_dataset_name | |||
String destination_project_id | |||
String? fq_gvs_extraction_temp_tables_dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this optional for now. AoU will likely stop providing this, and just let it go into the destination dataset.
Codecov Report
@@ Coverage Diff @@
## ah_var_store #7874 +/- ##
================================================
Coverage ? 86.297%
Complexity ? 35196
================================================
Files ? 2170
Lines ? 164876
Branches ? 17783
================================================
Hits ? 142283
Misses ? 16270
Partials ? 6323 |
Similar issues to last time. Do we have a test case yet that simulates AoU's usage?
In addition to this, the workflow seems to function on an older version of GVS, as long as I make the following modifications:
sample_info.withdrawn
,sample_info.is_control