Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quoting of table names #7666

Merged
merged 11 commits into from
Feb 14, 2022
Merged

Quoting of table names #7666

merged 11 commits into from
Feb 14, 2022

Conversation

kcibul
Copy link
Contributor

@kcibul kcibul commented Feb 8, 2022

The table names in GvsAssignId were not quoted with backticks, which is fine except if your dataset name starts with a number… which is a total valid identifier, but requires quoting.

Recently we had a customer (AoU) supply a dataset with the name 1kg_wgs which exposed this problem

Copy link

@rsasch rsasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are definitely more WDLs in GVS that don't use `s. Did you not run into issues while running those or are those outside the scope of this PR?

"UNION DISTINCT " \
"SELECT i.sample_name FROM items i WHERE i.sample_id IN (SELECT sample_id FROM ~{dataset_name}.sample_load_status) " \
| sed -e '/sample_name/d' > duplicates
echo "WITH items as (SELECT s.sample_id, s.sample_name, s.is_loaded FROM \`${TEMP_TABLE}\` t left outer join \`${SAMPLE_INFO_TABLE}\` s on (s.sample_name = t.sample_name)) " >> query.sql
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to adopt this approach of putting the SQL into a temp file and then run it so we could get around the mix of single-quotes, double-quotes, need for $ variable interpretation in bash and the backticks required for BQ.

@@ -433,7 +434,7 @@ task GetSampleIds {
python3 -c "from math import ceil; print(ceil($min_sample_id/~{samples_per_table}))" > min_sample_id

bq --project_id=~{project_id} query --format=csv --use_legacy_sql=false -n ~{num_samples} \
"SELECT sample_id, samples.sample_name FROM ~{dataset_name}.~{table_name} AS samples JOIN ${TEMP_TABLE} AS temp ON samples.sample_name=temp.sample_name" > sample_map
"SELECT sample_id, samples.sample_name FROM \`~{dataset_name}.~{table_name}\` AS samples JOIN \`${TEMP_TABLE}\` AS temp ON samples.sample_name=temp.sample_name" > sample_map
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of where we need $ interpolation (so we can't use single quotes) but also have the back-ticks to deal with

@rsasch rsasch self-requested a review February 14, 2022 19:42
Copy link

@rsasch rsasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for going through all those WDLs!

@kcibul kcibul merged commit c733b4b into ah_var_store Feb 14, 2022
@kcibul kcibul deleted the kc_quoting_bug branch February 14, 2022 19:59
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants