279 labels #7233

RoriCremer · 2021-04-26T19:59:30Z

For GVS Feature Extract, ~~Cohort Extract~~ and Prepare Callset we should add a bq labels to indicate the query and tool being

gvs_tool_name (e.g. feature-extract)
gvs_query_name (e.g. read-sample-table)

Python Prepare Callset:

Java GVS Feature Extract:

for Feature Extract
update the wdl to take in a query_labels optional string
update the GATK tool to take in a query_labels param
update the GATK tool to validate labels
update the GATK tool to add constant kv labels: "gvs_tool_name", "extract-features" and "gvs_query_name", "extract-features" (is there a way to get more explicit in the queries? isn't it just one query?)
test that this works with and without a label param passed in

for Prepare Callset
update the wdl to take in a query_labels optional string
update the python script to take in a query_labels param
update the python scrip to validate passed in labels
update the python script to add constant kv labels for ever single query individually and as a default
test that this works with and without a label param passed in

mmorgantaylor · 2021-05-05T15:59:52Z

scripts/variantstore/wdl/extract/create_cohort_extract_data_table.py

+
+    #Default QueryJobConfig will be merged into job configs passed into the query method.
+    # TODO I'm worried about how well the labels will be merged....
+    default_config = QueryJobConfig(labels=query_labels_map, priority="INTERACTIVE", use_query_cache=False )


does this get overwritten when you define labels later? how did you resolve this?

it doesn't get overwritten (see screenshot with default and query based labels)
in order to make sure they dont get overwritten, any new labels must be added to the client._default_query_job_config.labels that already exist

The screenshots seem to be from the Java based tools, do you know how this python works?

one of the screen shots is from the python I promise!!!!

the java one is the second screen shot and is pretty boring because it's only one query
the python one is the first one and is much more interesting because it's a series of queries, and you can see the individual ones are now labeled to keep the initial label that was previously there and now can have the custom one (where I shove in my name) AND one that's based on the query specifically eg: "populate-final-extract-table"

mmorgantaylor · 2021-05-05T16:01:21Z

src/main/java/org/broadinstitute/hellbender/tools/variantdb/nextgen/ExtractFeatures.java

@@ -65,6 +66,13 @@
        optional = true)
    protected double hqGenotypeABThreshold = 0.2;

+    @Argument(
+        fullName = "query-labels",
+        doc = "Key-value pairs to be added to the extraction BQ query. Ex: --query-labels label1=value1 --query-labels label2=value2",


would it be clearer to have Ex: --query-labels key1=value1 --query-labels key2=value2 ? or if that's not right, how are the keys defined?

I agree with this comment, but is that what's above (or was this already addressed?)

ahaessly

looks good. just a couple of q's

ahaessly · 2021-05-05T15:58:11Z

scripts/variantstore/wdl/GvsPrepareCallset.wdl

@@ -47,6 +48,7 @@ task PrepareCallsetTask {
    input {
        String destination_cohort_table_name
        String query_project
+        String? query_labels


I would have expected this to be Array[String]? and each label in the form "key:value". I think when you pass this to the command wdl knows how to make this into a multiple argument option. @kcibul can you confirm?
(same comment for the CreateFilterSet wdl)

hmm good point--I'm not sure which is best

Yes -- I was making the same comments above

I mean... it doesn't work the way it is right? What would the user pass to the WDL to supply multiple tables?

ack you're right---the one case I neglected to smoke test was using the wdl (and not just the plain python) with multiple custom labels

ahaessly · 2021-05-05T16:00:53Z

scripts/variantstore/wdl/extract/create_cohort_extract_data_table.py

@@ -275,6 +303,7 @@ def make_extract_table(fq_pet_vet_dataset,
  parser.add_argument('--destination_table',type=str, help='destination table', required=True)
  parser.add_argument('--fq_cohort_sample_names',type=str, help='FQN of cohort table to extract, contains "sample_name" column', required=True)
  parser.add_argument('--query_project',type=str, help='Google project where query should be executed', required=True)
+  parser.add_argument('--query_labels',type=str, help='Labels to put on the query that will show up in the billing', required=False)


if the labels need to be in a specific format, maybe add that to the help

kcibul · 2021-05-05T18:28:42Z

scripts/variantstore/wdl/GvsCreateFilterSet.wdl

@@ -13,6 +13,7 @@ workflow GvsCreateFilterSet {
        String default_dataset

        String query_project = data_project
+        String? query_labels


Array[String] ?

kcibul · 2021-05-05T18:29:44Z

scripts/variantstore/wdl/GvsCreateFilterSet.wdl

@@ -210,6 +212,7 @@ task ExtractFilterTask {
        File? gatk_override
        File? service_account_json
        String query_project
+        String? query_labels


Array[String]

kcibul · 2021-05-05T18:30:01Z

scripts/variantstore/wdl/GvsPrepareCallset.wdl

@@ -47,6 +48,7 @@ task PrepareCallsetTask {
    input {
        String destination_cohort_table_name
        String query_project
+        String? query_labels


Yes -- I was making the same comments above

kcibul · 2021-05-05T18:31:43Z

src/main/java/org/broadinstitute/hellbender/tools/variantdb/nextgen/ExtractFeatures.java

@@ -65,6 +66,13 @@
        optional = true)
    protected double hqGenotypeABThreshold = 0.2;

+    @Argument(
+        fullName = "query-labels",
+        doc = "Key-value pairs to be added to the extraction BQ query. Ex: --query-labels label1=value1 --query-labels label2=value2",


I agree with this comment, but is that what's above (or was this already addressed?)

kcibul · 2021-05-05T18:33:56Z

scripts/variantstore/wdl/extract/create_cohort_extract_data_table.py

@@ -57,7 +59,9 @@ def execute_with_retry(label, sql):
  start = time.time()
  while len(retry_delay) > 0:
    try:
-      query = client.query(sql)
+      labelValue = label.replace(" ","-").strip().lower()


rename the parameter to this function (above) to be "query_name" or something since label means something else now...

kcibul · 2021-05-05T18:39:46Z

src/main/java/org/broadinstitute/hellbender/tools/variantdb/nextgen/ExtractFeaturesEngine.java

+        labelForQuery.put("gvs_query_name", "extract-features");
+        // add additional key value pair labels
+
+        // Each resource can have multiple labels, up to a maximum of 64. -- labelKeys has to be !>64


technically !> 64 minus your two static labels above right?

right---which is why I need to throw if it's more than 62!

kcibul · 2021-05-05T18:40:56Z

src/main/java/org/broadinstitute/hellbender/tools/variantdb/nextgen/ExtractFeaturesEngine.java


        createVQSRInputFromTableResult(storageAPIAvroReader);
    }

+    private Map<String, String>  createQueryLabels(List<String> labelStringList) {


This is a great candidate for some unit tests...

do I just want to unit test createQueryLabels? or ExtractFeaturesEngine?

scripts/variantstore/wdl/extract/create_cohort_extract_data_table.py

kcibul · 2021-05-05T18:45:03Z

scripts/variantstore/wdl/GvsPrepareCallset.wdl

@@ -47,6 +48,7 @@ task PrepareCallsetTask {
    input {
        String destination_cohort_table_name
        String query_project
+        String? query_labels


I mean... it doesn't work the way it is right? What would the user pass to the WDL to supply multiple tables?

kcibul · 2021-05-05T18:45:32Z

scripts/variantstore/wdl/extract/create_cohort_extract_data_table.py

+
+    #Default QueryJobConfig will be merged into job configs passed into the query method.
+    # TODO I'm worried about how well the labels will be merged....
+    default_config = QueryJobConfig(labels=query_labels_map, priority="INTERACTIVE", use_query_cache=False )


The screenshots seem to be from the Java based tools, do you know how this python works?

scripts/variantstore/wdl/GvsPrepareCallset.wdl

scripts/variantstore/wdl/extract/create_cohort_extract_data_table.py

kcibul · 2021-05-18T20:57:28Z

scripts/variantstore/wdl/extract/create_cohort_extract_data_table.py

-    default_config = QueryJobConfig(labels={ "id" : f"test_cohort_export_{output_table_prefix}"}, priority="INTERACTIVE", use_query_cache=False )
+    # this is where a set of labels are being created for the cohort extract. Do we want the hardcoded one to be different?
+    query_labels_map = {}
+    query_labels_map["id"]= f"test_cohort_export_{output_table_prefix}"


let's have "id" be just the {output_table_prefix}. It just uniquely identify this run

kcibul · 2021-05-18T20:59:57Z

src/main/java/org/broadinstitute/hellbender/tools/variantdb/SampleList.java

@@ -81,8 +81,13 @@ private TableResult querySampleTable(String fqSampleTableName, String whereClaus
                "SELECT " + SchemaUtils.SAMPLE_ID_FIELD_NAME + ", " + SchemaUtils.SAMPLE_NAME_FIELD_NAME +
                " FROM `" + fqSampleTableName + "`" + whereClause;

+        Map<String, String> labelForQuery = new HashMap<String, String>();
+        labelForQuery.put("gvs_tool_name", "sample-list-creation");


I would remove this, or somehow take the label as a parameter and pass it in. There is no sample list creation tool... this is a helper function used by several tools

hmmmm I assumed we got charged for the query so it would be a line in the billing table---but is that not true?

kcibul · 2021-05-18T21:00:02Z

scripts/variantstore/wdl/extract/create_cohort_extract_data_table.py

+    # this is where a set of labels are being created for the cohort extract. Do we want the hardcoded one to be different?
+    query_labels_map = {}
+    query_labels_map["id"]= f"test_cohort_export_{output_table_prefix}"
+    query_labels_map["gvs_tool_name"]= f"create_cohort_export_{output_table_prefix}"


similarly, let's REMOVE the {output_table_prefix} here and have this just be the tool name (which, maybe should be gvs_prepare_callset

kcibul · 2021-05-18T21:01:21Z

src/main/java/org/broadinstitute/hellbender/tools/variantdb/nextgen/ExtractFeaturesEngine.java

@@ -131,18 +134,65 @@ public void traverse() {
                                                                                             INDEL_QUAL_THRESHOLD);

        final String userDefinedFunctions = ExtractFeaturesBQ.getVQSRFeatureExtractUserDefinedFunctionsString();
+        Map<String, String> labelForQuery = createQueryLabels(queryLabels);


these names are really similar, maybe call this "cleanQueryLabels" or something like that?

kcibul · 2021-05-18T21:02:32Z

...st/java/org/broadinstitute/hellbender/tools/variantdb/nextgen/ExtractFeaturesEngineTest.java

+        labelStringList.add("labelkey=labelvalue");
+        Map<String, String> labelMap = ExtractFeaturesEngine.createQueryLabels(labelStringList);
+        Assert.assertEquals(labelMap.get("gvs_tool_name"), "extract-features");
+        Assert.assertEquals(labelMap.get("gvs_query_name"), "extract-features");


don't you want to assert that your new label is also in there?

hahaha true

kcibul · 2021-05-18T21:03:28Z

...st/java/org/broadinstitute/hellbender/tools/variantdb/nextgen/ExtractFeaturesEngineTest.java

+import java.util.List;
+import java.util.Map;
+
+public class ExtractFeaturesEngineTest extends GATKBaseTest {


nice set of test cases!

add a hardcoded label k-v pair add loop to grab custom vals

kcibul · 2021-05-21T16:01:33Z

src/main/java/org/broadinstitute/hellbender/tools/gvs/common/SampleList.java

-        if (sampleTableName != null) {
-            initializeMaps(new TableReference(sampleTableName, SchemaUtils.SAMPLE_FIELDS), executionProjectId, printDebugInformation);
+    public SampleList(String sampleTableName, File sampleFile, String executionProjectId, boolean printDebugInformation, String originTool) {
+        if (sampleTableName != null && originTool != null) {


What happens if a caller supplies a sampleTableName but null for originTool?

RoriCremer changed the base branch from master to ah_var_store April 26, 2021 20:39

RoriCremer changed the title ~~Rc 279 labels~~ 279 labels Apr 28, 2021

RoriCremer marked this pull request as ready for review April 28, 2021 17:09

RoriCremer force-pushed the rc-279-labels branch 3 times, most recently from 068eba9 to 167353e Compare May 4, 2021 04:16

mmorgantaylor reviewed May 5, 2021

View reviewed changes

ahaessly approved these changes May 5, 2021

View reviewed changes

RoriCremer force-pushed the rc-279-labels branch from 167353e to 94a1972 Compare May 5, 2021 18:42

kcibul requested changes May 5, 2021

View reviewed changes

RoriCremer force-pushed the rc-279-labels branch from 94a1972 to 968dc02 Compare May 5, 2021 18:58

RoriCremer force-pushed the rc-279-labels branch from 4fd3112 to f9b0b61 Compare May 18, 2021 18:49

kcibul reviewed May 18, 2021

View reviewed changes

scripts/variantstore/wdl/GvsPrepareCallset.wdl Show resolved Hide resolved

kcibul reviewed May 18, 2021

View reviewed changes

scripts/variantstore/wdl/GvsPrepareCallset.wdl Show resolved Hide resolved

kcibul reviewed May 18, 2021

View reviewed changes

scripts/variantstore/wdl/extract/create_cohort_extract_data_table.py Outdated Show resolved Hide resolved

kcibul reviewed May 18, 2021

View reviewed changes

RoriCremer force-pushed the rc-279-labels branch 2 times, most recently from 8d90f21 to b0d140e Compare May 19, 2021 18:37

RoriCremer added 6 commits May 21, 2021 11:19

extract features -- add query labels

c943ced

add a hardcoded label k-v pair add loop to grab custom vals

make it json friendly

4e8d06b

add label key and value validation

9397404

do we want to?

33b6175

add labels to python extract cohort script

a635b4f

add labels to prepare callset wdl

d3d8113

RoriCremer added 23 commits May 21, 2021 11:19

cleaner hardcoding

52f9a8f

better python breakdown---needs a test

2b5c9b2

make the query labels optional

4768daf

make label params optional

b07ebff

properly pass existing labels thru

7a5d5ba

add python validation

b404676

keys not labels!

9952228

add sample list creation labels

ab9bebc

correct wdl input

c36288d

better comments

e0f34c7

correct wdl inputs

9d07b15

get the wdl right!

8906d4b

add unit tests to the parsing/validating of the labels

798f202

update wdl langauge

a6ed51e

cleanup python

0883df1

take in multiple values for params

a8c3392

better unit test

0f18b03

rename labels

9cdee7c

clearer wdl for array param

4a5e59c

pass through tool label to sample list

2b54b0a

add labels to all the tools that use sample list

67f97c7

add cleanup to sample list labels

8c71afe

add notes for string coercion

f52e39e

RoriCremer force-pushed the rc-279-labels branch from 641678f to f52e39e Compare May 21, 2021 15:27

kcibul reviewed May 21, 2021

View reviewed changes

fix rebase error

ca4c99c

RoriCremer merged commit d37b9c8 into ah_var_store May 21, 2021

RoriCremer deleted the rc-279-labels branch May 21, 2021 21:51

This was referenced Mar 17, 2023

lb merge gvs branch #8248

Closed

testing something, please ignore #8251

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

279 labels #7233

279 labels #7233

RoriCremer commented Apr 26, 2021 •

edited

Loading

mmorgantaylor May 5, 2021

RoriCremer May 5, 2021

kcibul May 5, 2021

RoriCremer May 5, 2021

RoriCremer May 5, 2021

mmorgantaylor May 5, 2021

kcibul May 5, 2021

ahaessly left a comment

ahaessly May 5, 2021

RoriCremer May 5, 2021

kcibul May 5, 2021

kcibul May 5, 2021

RoriCremer May 5, 2021

ahaessly May 5, 2021

kcibul May 5, 2021

kcibul May 5, 2021

kcibul May 5, 2021

kcibul May 5, 2021

kcibul May 5, 2021

kcibul May 5, 2021

RoriCremer May 7, 2021

kcibul May 5, 2021

RoriCremer May 17, 2021

kcibul May 5, 2021

kcibul May 5, 2021

kcibul May 18, 2021

kcibul May 18, 2021

RoriCremer May 18, 2021

kcibul May 18, 2021

kcibul May 18, 2021

kcibul May 18, 2021

RoriCremer May 18, 2021

kcibul May 18, 2021

kcibul May 21, 2021

279 labels #7233

279 labels #7233

Conversation

RoriCremer commented Apr 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahaessly left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RoriCremer commented Apr 26, 2021 •

edited

Loading