-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
279 labels #7233
279 labels #7233
Conversation
068eba9
to
167353e
Compare
|
||
#Default QueryJobConfig will be merged into job configs passed into the query method. | ||
# TODO I'm worried about how well the labels will be merged.... | ||
default_config = QueryJobConfig(labels=query_labels_map, priority="INTERACTIVE", use_query_cache=False ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this get overwritten when you define labels later? how did you resolve this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't get overwritten (see screenshot with default and query based labels)
in order to make sure they dont get overwritten, any new labels must be added to the client._default_query_job_config.labels
that already exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The screenshots seem to be from the Java based tools, do you know how this python works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one of the screen shots is from the python I promise!!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the java one is the second screen shot and is pretty boring because it's only one query
the python one is the first one and is much more interesting because it's a series of queries, and you can see the individual ones are now labeled to keep the initial label that was previously there and now can have the custom one (where I shove in my name) AND one that's based on the query specifically eg: "populate-final-extract-table"
@@ -65,6 +66,13 @@ | |||
optional = true) | |||
protected double hqGenotypeABThreshold = 0.2; | |||
|
|||
@Argument( | |||
fullName = "query-labels", | |||
doc = "Key-value pairs to be added to the extraction BQ query. Ex: --query-labels label1=value1 --query-labels label2=value2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be clearer to have Ex: --query-labels key1=value1 --query-labels key2=value2
? or if that's not right, how are the keys defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this comment, but is that what's above (or was this already addressed?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. just a couple of q's
@@ -47,6 +48,7 @@ task PrepareCallsetTask { | |||
input { | |||
String destination_cohort_table_name | |||
String query_project | |||
String? query_labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected this to be Array[String]? and each label in the form "key:value". I think when you pass this to the command wdl knows how to make this into a multiple argument option. @kcibul can you confirm?
(same comment for the CreateFilterSet wdl)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm good point--I'm not sure which is best
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes -- I was making the same comments above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean... it doesn't work the way it is right? What would the user pass to the WDL to supply multiple tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack you're right---the one case I neglected to smoke test was using the wdl (and not just the plain python) with multiple custom labels
@@ -275,6 +303,7 @@ def make_extract_table(fq_pet_vet_dataset, | |||
parser.add_argument('--destination_table',type=str, help='destination table', required=True) | |||
parser.add_argument('--fq_cohort_sample_names',type=str, help='FQN of cohort table to extract, contains "sample_name" column', required=True) | |||
parser.add_argument('--query_project',type=str, help='Google project where query should be executed', required=True) | |||
parser.add_argument('--query_labels',type=str, help='Labels to put on the query that will show up in the billing', required=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the labels need to be in a specific format, maybe add that to the help
@@ -13,6 +13,7 @@ workflow GvsCreateFilterSet { | |||
String default_dataset | |||
|
|||
String query_project = data_project | |||
String? query_labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array[String] ?
@@ -210,6 +212,7 @@ task ExtractFilterTask { | |||
File? gatk_override | |||
File? service_account_json | |||
String query_project | |||
String? query_labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array[String]
@@ -47,6 +48,7 @@ task PrepareCallsetTask { | |||
input { | |||
String destination_cohort_table_name | |||
String query_project | |||
String? query_labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes -- I was making the same comments above
@@ -65,6 +66,13 @@ | |||
optional = true) | |||
protected double hqGenotypeABThreshold = 0.2; | |||
|
|||
@Argument( | |||
fullName = "query-labels", | |||
doc = "Key-value pairs to be added to the extraction BQ query. Ex: --query-labels label1=value1 --query-labels label2=value2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this comment, but is that what's above (or was this already addressed?)
@@ -57,7 +59,9 @@ def execute_with_retry(label, sql): | |||
start = time.time() | |||
while len(retry_delay) > 0: | |||
try: | |||
query = client.query(sql) | |||
labelValue = label.replace(" ","-").strip().lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename the parameter to this function (above) to be "query_name" or something since label means something else now...
labelForQuery.put("gvs_query_name", "extract-features"); | ||
// add additional key value pair labels | ||
|
||
// Each resource can have multiple labels, up to a maximum of 64. -- labelKeys has to be !>64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically !> 64 minus your two static labels above right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right---which is why I need to throw if it's more than 62!
|
||
createVQSRInputFromTableResult(storageAPIAvroReader); | ||
} | ||
|
||
private Map<String, String> createQueryLabels(List<String> labelStringList) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great candidate for some unit tests...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do I just want to unit test createQueryLabels? or ExtractFeaturesEngine?
scripts/variantstore/wdl/extract/create_cohort_extract_data_table.py
Outdated
Show resolved
Hide resolved
@@ -47,6 +48,7 @@ task PrepareCallsetTask { | |||
input { | |||
String destination_cohort_table_name | |||
String query_project | |||
String? query_labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean... it doesn't work the way it is right? What would the user pass to the WDL to supply multiple tables?
|
||
#Default QueryJobConfig will be merged into job configs passed into the query method. | ||
# TODO I'm worried about how well the labels will be merged.... | ||
default_config = QueryJobConfig(labels=query_labels_map, priority="INTERACTIVE", use_query_cache=False ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The screenshots seem to be from the Java based tools, do you know how this python works?
scripts/variantstore/wdl/extract/create_cohort_extract_data_table.py
Outdated
Show resolved
Hide resolved
default_config = QueryJobConfig(labels={ "id" : f"test_cohort_export_{output_table_prefix}"}, priority="INTERACTIVE", use_query_cache=False ) | ||
# this is where a set of labels are being created for the cohort extract. Do we want the hardcoded one to be different? | ||
query_labels_map = {} | ||
query_labels_map["id"]= f"test_cohort_export_{output_table_prefix}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's have "id" be just the {output_table_prefix}. It just uniquely identify this run
@@ -81,8 +81,13 @@ private TableResult querySampleTable(String fqSampleTableName, String whereClaus | |||
"SELECT " + SchemaUtils.SAMPLE_ID_FIELD_NAME + ", " + SchemaUtils.SAMPLE_NAME_FIELD_NAME + | |||
" FROM `" + fqSampleTableName + "`" + whereClause; | |||
|
|||
Map<String, String> labelForQuery = new HashMap<String, String>(); | |||
labelForQuery.put("gvs_tool_name", "sample-list-creation"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this, or somehow take the label as a parameter and pass it in. There is no sample list creation tool... this is a helper function used by several tools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmmm I assumed we got charged for the query so it would be a line in the billing table---but is that not true?
# this is where a set of labels are being created for the cohort extract. Do we want the hardcoded one to be different? | ||
query_labels_map = {} | ||
query_labels_map["id"]= f"test_cohort_export_{output_table_prefix}" | ||
query_labels_map["gvs_tool_name"]= f"create_cohort_export_{output_table_prefix}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly, let's REMOVE the {output_table_prefix}
here and have this just be the tool name (which, maybe should be gvs_prepare_callset
@@ -131,18 +134,65 @@ public void traverse() { | |||
INDEL_QUAL_THRESHOLD); | |||
|
|||
final String userDefinedFunctions = ExtractFeaturesBQ.getVQSRFeatureExtractUserDefinedFunctionsString(); | |||
Map<String, String> labelForQuery = createQueryLabels(queryLabels); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these names are really similar, maybe call this "cleanQueryLabels" or something like that?
labelStringList.add("labelkey=labelvalue"); | ||
Map<String, String> labelMap = ExtractFeaturesEngine.createQueryLabels(labelStringList); | ||
Assert.assertEquals(labelMap.get("gvs_tool_name"), "extract-features"); | ||
Assert.assertEquals(labelMap.get("gvs_query_name"), "extract-features"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you want to assert that your new label is also in there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hahaha true
import java.util.List; | ||
import java.util.Map; | ||
|
||
public class ExtractFeaturesEngineTest extends GATKBaseTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice set of test cases!
8d90f21
to
b0d140e
Compare
add a hardcoded label k-v pair add loop to grab custom vals
if (sampleTableName != null) { | ||
initializeMaps(new TableReference(sampleTableName, SchemaUtils.SAMPLE_FIELDS), executionProjectId, printDebugInformation); | ||
public SampleList(String sampleTableName, File sampleFile, String executionProjectId, boolean printDebugInformation, String originTool) { | ||
if (sampleTableName != null && originTool != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if a caller supplies a sampleTableName but null for originTool?
For GVS Feature Extract,
Cohort Extractand Prepare Callset we should add a bq labels to indicate the query and tool beinggvs_tool_name (e.g. feature-extract)
gvs_query_name (e.g. read-sample-table)
Python Prepare Callset:
Java GVS Feature Extract:
for Feature Extract
update the wdl to take in a query_labels optional string
update the GATK tool to take in a query_labels param
update the GATK tool to validate labels
update the GATK tool to add constant kv labels: "gvs_tool_name", "extract-features" and "gvs_query_name", "extract-features" (is there a way to get more explicit in the queries? isn't it just one query?)
test that this works with and without a label param passed in
for Prepare Callset
update the wdl to take in a query_labels optional string
update the python script to take in a query_labels param
update the python scrip to validate passed in labels
update the python script to add constant kv labels for ever single query individually and as a default
test that this works with and without a label param passed in