Fix AoU workflow bugs #7874

calbach · 2022-05-28T00:52:46Z

Similar issues to last time. Do we have a test case yet that simulates AoU's usage?

In addition to this, the workflow seems to function on an older version of GVS, as long as I make the following modifications:

Add columns sample_info.withdrawn, sample_info.is_control
Backfill true/false accordingly to these columns (backfilling withdrawn seems unnecessary)

calbach · 2022-05-28T00:53:42Z

scripts/variantstore/wdl/GvsExtractCallset.wdl

@@ -8,6 +8,8 @@ workflow GvsExtractCallset {
    String dataset_name
    String project_id

+    String cohort_project_id = project_id


PrepRanges initializes the cohort tables into destination_*; then this workflow assumed the cohort tables are in the GVS dataset.

calbach · 2022-05-28T00:54:25Z

scripts/variantstore/wdl/GvsExtractCallset.wdl

@@ -182,7 +185,7 @@ task ValidateFilterSetName {

    echo "project_id = ~{query_project}" > ~/.bigqueryrc

-    OUTPUT=$(bq --location=US --project_id=~{query_project} --format=csv query --use_legacy_sql=false "SELECT filter_set_name as available_filter_set_names FROM ~{data_project}.~{data_dataset}.filter_set_info GROUP BY filter_set_name")
+    OUTPUT=$(bq --location=US --project_id=~{query_project} --format=csv query --use_legacy_sql=false "SELECT filter_set_name as available_filter_set_names FROM \`~{data_project}.~{data_dataset}.filter_set_info\` GROUP BY filter_set_name")


Were you able to run this version successfully? This doesn't execute without this change for me.

Yes. That line worked for us with Charlie, but I wonder if there is something going on with the default project id? It feels odd that you couldn't run this.

FWIW, here is the error I got:

sdk@sha256:ff4546d0bab6048b4ae61ddfb1dfdccb12b3725f5833fb696a5c5915e5bcdd15 /cromwell_root/script + '[' false = true ']' + echo 'project_id = terra-vpc-sc-dev-d59ab2f1' ++ bq --location=US --project_id=terra-vpc-sc-dev-d59ab2f1 --format=csv query --use_legacy_sql=false 'SELECT filter_set_name as available_filter_set_names FROM fc-aou-cdr-synth-test-2.1kg_wgs_2022q1.filter_set_info GROUP BY filter_set_name' + OUTPUT='Error in query string: Error processing job '\''terra-vpc-sc- dev-d59ab2f1:bqjob_r6cfbf771954bf621_00000181043dd904_1'\'': Syntax error: Missing whitespace between literal and alias at [1:84]' 2022/05/27 06:39:34 Starting delocalization.

I suspect the dataset name was the issue since it cannot be an unquoted identifier with a leading 1.

Nice catch!

calbach · 2022-05-28T00:54:55Z

scripts/variantstore/wdl/GvsExtractCallset.wdl

@@ -172,7 +175,7 @@ task ValidateFilterSetName {
  String has_service_account_file = if (defined(service_account_json_path)) then 'true' else 'false'

  command <<<
-    set -e
+    set -ex


This was failing with no error output. -x shows the bash that was run and allowed me to diagnose the issue.

We should probably standardize on setting some of these options in our command blocks by default (errexit, xtrace, pipefail, possibly others).

calbach · 2022-05-28T00:55:02Z

scripts/variantstore/wdl/GvsExtractCallset.wdl

@@ -229,7 +232,6 @@ task ExtractTask {
    Boolean emit_ads

    Boolean do_not_filter_override
-    String fq_ranges_dataset


calbach · 2022-05-28T00:55:30Z

scripts/variantstore/wdl/GvsExtractCohortFromSampleNames.wdl

@@ -17,9 +17,9 @@ workflow GvsExtractCohortFromSampleNames {
    # not using the defaults in GvsPrepareCallset because we're using pre created datasets defined by the caller
    String destination_dataset_name
    String destination_project_id
+    String? fq_gvs_extraction_temp_tables_dataset


Making this optional for now. AoU will likely stop providing this, and just let it go into the destination dataset.

codecov · 2022-05-28T01:20:30Z

Codecov Report

❗ No coverage uploaded for pull request base (ah_var_store@91c33df). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff                @@
##             ah_var_store     #7874   +/-   ##
================================================
  Coverage                ?   86.297%           
  Complexity              ?     35196           
================================================
  Files                   ?      2170           
  Lines                   ?    164876           
  Branches                ?     17783           
================================================
  Hits                    ?    142283           
  Misses                  ?     16270           
  Partials                ?      6323

Fix AoU workflow bugs

d13abbd

calbach requested a review from rsasch May 28, 2022 00:52

calbach commented May 28, 2022

View reviewed changes

calbach mentioned this pull request May 28, 2022

[RW-8274][risk=low] Bump GVS extraction workflow in test all-of-us/workbench#6743

Merged

RoriCremer approved these changes Jun 3, 2022

View reviewed changes

calbach merged commit f6fde4e into ah_var_store Jun 3, 2022

calbach deleted the ch_gvs_updates branch June 3, 2022 00:19

mcovarr pushed a commit that referenced this pull request Jun 6, 2022

Fix AoU workflow bugs (#7874)

6f00ef3

This was referenced Mar 17, 2023

lb merge gvs branch #8248

Closed

testing something, please ignore #8251

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix AoU workflow bugs #7874

Fix AoU workflow bugs #7874

calbach commented May 28, 2022

calbach May 28, 2022

calbach May 28, 2022

RoriCremer Jun 1, 2022

calbach Jun 1, 2022

mcovarr Jun 2, 2022

calbach Jun 2, 2022

calbach May 28, 2022

mcovarr Jun 2, 2022

calbach May 28, 2022

calbach May 28, 2022

codecov bot commented May 28, 2022 •

edited

Loading

Fix AoU workflow bugs #7874

Fix AoU workflow bugs #7874

Conversation

calbach commented May 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented May 28, 2022 • edited Loading

Codecov Report

codecov bot commented May 28, 2022 •

edited

Loading