Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AoU workflow bugs #7874

Merged
merged 1 commit into from
Jun 3, 2022
Merged

Fix AoU workflow bugs #7874

merged 1 commit into from
Jun 3, 2022

Conversation

calbach
Copy link

@calbach calbach commented May 28, 2022

Similar issues to last time. Do we have a test case yet that simulates AoU's usage?

In addition to this, the workflow seems to function on an older version of GVS, as long as I make the following modifications:

  • Add columns sample_info.withdrawn, sample_info.is_control
  • Backfill true/false accordingly to these columns (backfilling withdrawn seems unnecessary)

@calbach calbach requested a review from rsasch May 28, 2022 00:52
@@ -8,6 +8,8 @@ workflow GvsExtractCallset {
String dataset_name
String project_id

String cohort_project_id = project_id
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PrepRanges initializes the cohort tables into destination_*; then this workflow assumed the cohort tables are in the GVS dataset.

@@ -182,7 +185,7 @@ task ValidateFilterSetName {

echo "project_id = ~{query_project}" > ~/.bigqueryrc

OUTPUT=$(bq --location=US --project_id=~{query_project} --format=csv query --use_legacy_sql=false "SELECT filter_set_name as available_filter_set_names FROM ~{data_project}.~{data_dataset}.filter_set_info GROUP BY filter_set_name")
OUTPUT=$(bq --location=US --project_id=~{query_project} --format=csv query --use_legacy_sql=false "SELECT filter_set_name as available_filter_set_names FROM \`~{data_project}.~{data_dataset}.filter_set_info\` GROUP BY filter_set_name")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you able to run this version successfully? This doesn't execute without this change for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That line worked for us with Charlie, but I wonder if there is something going on with the default project id? It feels odd that you couldn't run this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, here is the error I got:

sdk@sha256:ff4546d0bab6048b4ae61ddfb1dfdccb12b3725f5833fb696a5c5915e5bcdd15 /cromwell_root/script
+ '[' false = true ']'
+ echo 'project_id = terra-vpc-sc-dev-d59ab2f1'
++ bq --location=US --project_id=terra-vpc-sc-dev-d59ab2f1 --format=csv query --use_legacy_sql=false 'SELECT filter_set_name as available_filter_set_names FROM fc-aou-cdr-synth-test-2.1kg_wgs_2022q1.filter_set_info GROUP BY filter_set_name'
+ OUTPUT='Error in query string: Error processing job '\''terra-vpc-sc-
dev-d59ab2f1:bqjob_r6cfbf771954bf621_00000181043dd904_1'\'': Syntax error: Missing
whitespace between literal and alias at [1:84]'
2022/05/27 06:39:34 Starting delocalization.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the dataset name was the issue since it cannot be an unquoted identifier with a leading 1.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@@ -172,7 +175,7 @@ task ValidateFilterSetName {
String has_service_account_file = if (defined(service_account_json_path)) then 'true' else 'false'

command <<<
set -e
set -ex
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was failing with no error output. -x shows the bash that was run and allowed me to diagnose the issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably standardize on setting some of these options in our command blocks by default (errexit, xtrace, pipefail, possibly others).

@@ -229,7 +232,6 @@ task ExtractTask {
Boolean emit_ads

Boolean do_not_filter_override
String fq_ranges_dataset
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

@@ -17,9 +17,9 @@ workflow GvsExtractCohortFromSampleNames {
# not using the defaults in GvsPrepareCallset because we're using pre created datasets defined by the caller
String destination_dataset_name
String destination_project_id
String? fq_gvs_extraction_temp_tables_dataset
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this optional for now. AoU will likely stop providing this, and just let it go into the destination dataset.

@codecov
Copy link

codecov bot commented May 28, 2022

Codecov Report

❗ No coverage uploaded for pull request base (ah_var_store@91c33df). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff                @@
##             ah_var_store     #7874   +/-   ##
================================================
  Coverage                ?   86.297%           
  Complexity              ?     35196           
================================================
  Files                   ?      2170           
  Lines                   ?    164876           
  Branches                ?     17783           
================================================
  Hits                    ?    142283           
  Misses                  ?     16270           
  Partials                ?      6323           

@calbach calbach merged commit f6fde4e into ah_var_store Jun 3, 2022
@calbach calbach deleted the ch_gvs_updates branch June 3, 2022 00:19
mcovarr pushed a commit that referenced this pull request Jun 6, 2022
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants