Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

279 labels #7233

Merged
merged 33 commits into from
May 21, 2021
Merged

279 labels #7233

merged 33 commits into from
May 21, 2021

Conversation

RoriCremer
Copy link
Contributor

@RoriCremer RoriCremer commented Apr 26, 2021

For GVS Feature Extract, Cohort Extract and Prepare Callset we should add a bq labels to indicate the query and tool being

gvs_tool_name (e.g. feature-extract)
gvs_query_name (e.g. read-sample-table)

Python Prepare Callset:

Screen Shot 2021-05-19 at 9 31 04 AM

Java GVS Feature Extract:
Screen Shot 2021-05-19 at 9 31 25 AM

for Feature Extract
update the wdl to take in a query_labels optional string
update the GATK tool to take in a query_labels param
update the GATK tool to validate labels
update the GATK tool to add constant kv labels: "gvs_tool_name", "extract-features" and "gvs_query_name", "extract-features" (is there a way to get more explicit in the queries? isn't it just one query?)
test that this works with and without a label param passed in

for Prepare Callset
update the wdl to take in a query_labels optional string
update the python script to take in a query_labels param
update the python scrip to validate passed in labels
update the python script to add constant kv labels for ever single query individually and as a default
test that this works with and without a label param passed in

Screen Shot 2021-05-05 at 2 40 30 PM

@RoriCremer RoriCremer changed the base branch from master to ah_var_store April 26, 2021 20:39
@RoriCremer RoriCremer changed the title Rc 279 labels 279 labels Apr 28, 2021
@RoriCremer RoriCremer marked this pull request as ready for review April 28, 2021 17:09
@RoriCremer RoriCremer force-pushed the rc-279-labels branch 3 times, most recently from 068eba9 to 167353e Compare May 4, 2021 04:16

#Default QueryJobConfig will be merged into job configs passed into the query method.
# TODO I'm worried about how well the labels will be merged....
default_config = QueryJobConfig(labels=query_labels_map, priority="INTERACTIVE", use_query_cache=False )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this get overwritten when you define labels later? how did you resolve this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't get overwritten (see screenshot with default and query based labels)
in order to make sure they dont get overwritten, any new labels must be added to the client._default_query_job_config.labels that already exist

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The screenshots seem to be from the Java based tools, do you know how this python works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of the screen shots is from the python I promise!!!!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the java one is the second screen shot and is pretty boring because it's only one query
the python one is the first one and is much more interesting because it's a series of queries, and you can see the individual ones are now labeled to keep the initial label that was previously there and now can have the custom one (where I shove in my name) AND one that's based on the query specifically eg: "populate-final-extract-table"

@@ -65,6 +66,13 @@
optional = true)
protected double hqGenotypeABThreshold = 0.2;

@Argument(
fullName = "query-labels",
doc = "Key-value pairs to be added to the extraction BQ query. Ex: --query-labels label1=value1 --query-labels label2=value2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be clearer to have Ex: --query-labels key1=value1 --query-labels key2=value2 ? or if that's not right, how are the keys defined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this comment, but is that what's above (or was this already addressed?)

Copy link
Contributor

@ahaessly ahaessly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. just a couple of q's

@@ -47,6 +48,7 @@ task PrepareCallsetTask {
input {
String destination_cohort_table_name
String query_project
String? query_labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected this to be Array[String]? and each label in the form "key:value". I think when you pass this to the command wdl knows how to make this into a multiple argument option. @kcibul can you confirm?
(same comment for the CreateFilterSet wdl)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm good point--I'm not sure which is best

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- I was making the same comments above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean... it doesn't work the way it is right? What would the user pass to the WDL to supply multiple tables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack you're right---the one case I neglected to smoke test was using the wdl (and not just the plain python) with multiple custom labels

@@ -275,6 +303,7 @@ def make_extract_table(fq_pet_vet_dataset,
parser.add_argument('--destination_table',type=str, help='destination table', required=True)
parser.add_argument('--fq_cohort_sample_names',type=str, help='FQN of cohort table to extract, contains "sample_name" column', required=True)
parser.add_argument('--query_project',type=str, help='Google project where query should be executed', required=True)
parser.add_argument('--query_labels',type=str, help='Labels to put on the query that will show up in the billing', required=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the labels need to be in a specific format, maybe add that to the help

@@ -13,6 +13,7 @@ workflow GvsCreateFilterSet {
String default_dataset

String query_project = data_project
String? query_labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array[String] ?

@@ -210,6 +212,7 @@ task ExtractFilterTask {
File? gatk_override
File? service_account_json
String query_project
String? query_labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array[String]

@@ -47,6 +48,7 @@ task PrepareCallsetTask {
input {
String destination_cohort_table_name
String query_project
String? query_labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- I was making the same comments above

@@ -65,6 +66,13 @@
optional = true)
protected double hqGenotypeABThreshold = 0.2;

@Argument(
fullName = "query-labels",
doc = "Key-value pairs to be added to the extraction BQ query. Ex: --query-labels label1=value1 --query-labels label2=value2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this comment, but is that what's above (or was this already addressed?)

@@ -57,7 +59,9 @@ def execute_with_retry(label, sql):
start = time.time()
while len(retry_delay) > 0:
try:
query = client.query(sql)
labelValue = label.replace(" ","-").strip().lower()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename the parameter to this function (above) to be "query_name" or something since label means something else now...

labelForQuery.put("gvs_query_name", "extract-features");
// add additional key value pair labels

// Each resource can have multiple labels, up to a maximum of 64. -- labelKeys has to be !>64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically !> 64 minus your two static labels above right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right---which is why I need to throw if it's more than 62!


createVQSRInputFromTableResult(storageAPIAvroReader);
}

private Map<String, String> createQueryLabels(List<String> labelStringList) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great candidate for some unit tests...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do I just want to unit test createQueryLabels? or ExtractFeaturesEngine?

@@ -47,6 +48,7 @@ task PrepareCallsetTask {
input {
String destination_cohort_table_name
String query_project
String? query_labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean... it doesn't work the way it is right? What would the user pass to the WDL to supply multiple tables?


#Default QueryJobConfig will be merged into job configs passed into the query method.
# TODO I'm worried about how well the labels will be merged....
default_config = QueryJobConfig(labels=query_labels_map, priority="INTERACTIVE", use_query_cache=False )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The screenshots seem to be from the Java based tools, do you know how this python works?

default_config = QueryJobConfig(labels={ "id" : f"test_cohort_export_{output_table_prefix}"}, priority="INTERACTIVE", use_query_cache=False )
# this is where a set of labels are being created for the cohort extract. Do we want the hardcoded one to be different?
query_labels_map = {}
query_labels_map["id"]= f"test_cohort_export_{output_table_prefix}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's have "id" be just the {output_table_prefix}. It just uniquely identify this run

@@ -81,8 +81,13 @@ private TableResult querySampleTable(String fqSampleTableName, String whereClaus
"SELECT " + SchemaUtils.SAMPLE_ID_FIELD_NAME + ", " + SchemaUtils.SAMPLE_NAME_FIELD_NAME +
" FROM `" + fqSampleTableName + "`" + whereClause;

Map<String, String> labelForQuery = new HashMap<String, String>();
labelForQuery.put("gvs_tool_name", "sample-list-creation");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this, or somehow take the label as a parameter and pass it in. There is no sample list creation tool... this is a helper function used by several tools

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmm I assumed we got charged for the query so it would be a line in the billing table---but is that not true?

# this is where a set of labels are being created for the cohort extract. Do we want the hardcoded one to be different?
query_labels_map = {}
query_labels_map["id"]= f"test_cohort_export_{output_table_prefix}"
query_labels_map["gvs_tool_name"]= f"create_cohort_export_{output_table_prefix}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, let's REMOVE the {output_table_prefix} here and have this just be the tool name (which, maybe should be gvs_prepare_callset

@@ -131,18 +134,65 @@ public void traverse() {
INDEL_QUAL_THRESHOLD);

final String userDefinedFunctions = ExtractFeaturesBQ.getVQSRFeatureExtractUserDefinedFunctionsString();
Map<String, String> labelForQuery = createQueryLabels(queryLabels);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these names are really similar, maybe call this "cleanQueryLabels" or something like that?

labelStringList.add("labelkey=labelvalue");
Map<String, String> labelMap = ExtractFeaturesEngine.createQueryLabels(labelStringList);
Assert.assertEquals(labelMap.get("gvs_tool_name"), "extract-features");
Assert.assertEquals(labelMap.get("gvs_query_name"), "extract-features");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you want to assert that your new label is also in there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hahaha true

import java.util.List;
import java.util.Map;

public class ExtractFeaturesEngineTest extends GATKBaseTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice set of test cases!

@RoriCremer RoriCremer force-pushed the rc-279-labels branch 2 times, most recently from 8d90f21 to b0d140e Compare May 19, 2021 18:37
if (sampleTableName != null) {
initializeMaps(new TableReference(sampleTableName, SchemaUtils.SAMPLE_FIELDS), executionProjectId, printDebugInformation);
public SampleList(String sampleTableName, File sampleFile, String executionProjectId, boolean printDebugInformation, String originTool) {
if (sampleTableName != null && originTool != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a caller supplies a sampleTableName but null for originTool?

@RoriCremer RoriCremer merged commit d37b9c8 into ah_var_store May 21, 2021
@RoriCremer RoriCremer deleted the rc-279-labels branch May 21, 2021 21:51
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants