-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filter on gvs_ids for workflow #7428
Conversation
@@ -323,6 +325,7 @@ task GetMaxTableIdLegacy { | |||
} | |||
output { | |||
Int max_table_id = read_int(stdout()) | |||
File gvs_ids = "gvs_ids" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any benefit to making gvs_ids a var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure what you mean by a var.
Do you mean declare
String gvs_id_file = "gvs_ids"
and then the output would be ~{gvs_id_file}
If so, I'm not sure of the trade offs between the two. Any opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you have works great--- I was just "thinking out loud"....
@@ -838,7 +843,7 @@ task AddIsLoadedColumn { | |||
|
|||
# set is_loaded to true if there is a corresponding pet table partition with rows for that sample_id | |||
bq --location=US --project_id=~{project_id} query --format=csv --use_legacy_sql=false \ | |||
"UPDATE ~{dataset_name}.sample_info SET is_loaded = true WHERE sample_id IN (SELECT CAST(partition_id AS INT64) from ~{dataset_name}.INFORMATION_SCHEMA.PARTITIONS WHERE partition_id != '__UNPARTITIONED__' AND total_logical_bytes > 0 AND table_name LIKE \"pet_%\")" | |||
"UPDATE ~{dataset_name}.sample_info SET is_loaded = true WHERE sample_id IN (SELECT CAST(partition_id AS INT64) from ~{dataset_name}.INFORMATION_SCHEMA.PARTITIONS WHERE partition_id in ('~{sep="\',\'" gvs_id_array}') AND total_logical_bytes > 0 AND table_name LIKE \"pet_%\")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this made me wonder if there would be a benefit to checking and validating that there are no samples with a partition_id of 'UNPARTITIONED'....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that could be a good check. but i'm wondering where/when to do it? I guess we could do it here and fail if there is data in unpartitioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I'm not sure where the check could go---maybe just print out a warning in the mean time? I dont think it's necessary to this pr though and would be fine as a follow on ticket
* filter on gvs_ids for workflow * update for legacy sample_map
only set is_loaded to true for the sample ids being processesd in the workflow
VS-176