Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incident VS 365 clinvar classification fix #7769

Merged
merged 10 commits into from
Apr 27, 2022

Conversation

RoriCremer
Copy link
Contributor

@RoriCremer RoriCremer commented Apr 8, 2022

In the python file that transformed the annotation jsons into the jsons for big query ingest, there were two typos, preventing "pathogenic" and "likely pathogenic" from being used as values

@RoriCremer RoriCremer changed the base branch from master to ah_var_store April 8, 2022 18:49
@RoriCremer RoriCremer force-pushed the rc-vs-365-clinvar-class branch 2 times, most recently from c409998 to 0cf009a Compare April 8, 2022 19:07
Copy link
Collaborator

@mcovarr mcovarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on the parts I felt I understood but there's a lot I still don't understand. 🙂

export GOOGLE_APPLICATION_CREDENTIALS=local.service_account.json
gcloud auth activate-service-account --key-file=local.service_account.json

gsutil cp ~{inputFileofFileNames} ~{updated_input_files}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks this line is meant to be outside the if? If has_service_account_file is not true this command block will do nothing and the output expression for input_jsons might not be happy about that. 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 77 should save me in this scenario, but soon, no SA!

export GOOGLE_APPLICATION_CREDENTIALS=local.service_account.json
gcloud auth activate-service-account --key-file=local.service_account.json

gsutil cp ~{annotation_json} ~{updated_annotation_json}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar "should be outside if" situation here

String dataset_name
File? vat_schema_json_file = "gs://broad-dsp-spec-ops/scratch/rcremer/Nirvana/schemas/vat_schema.json"
File? variant_transcript_schema_json_file = "gs://broad-dsp-spec-ops/scratch/rcremer/Nirvana/schemas/vt_schema.json"
File? genes_schema_json_file = "gs://broad-dsp-spec-ops/scratch/rcremer/Nirvana/schemas/genes_schema.json"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a stable location for these files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope---we need to have a conversation about where all the Nirvana stuff and VAT schemas should live

# ------------------------------------------------
# Runtime settings:
runtime {
docker: "us.gcr.io/broad-dsde-methods/variantstore:rc_vat_update_2022_05_06"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up this date is in the future 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well if I keep getting distracted by other work....

echo "Loading data into a pre-vat table ~{dataset_name}.~{variant_transcript_table}"
echo ~{vt_path}
echo ~{genes_path}
bq --location=US load --project_id=~{project_id} --source_format=NEWLINE_DELIMITED_JSON ~{dataset_name}.~{variant_transcript_table} ~{vt_path}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to proceed with loading data into the variant transcript table if it already existed? The seemingly similar logic for the vat table on line 248 rms the table if it already existed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes---the logic was originally created such that we could keep adding data to the two temp tables--the Variant-Transcript table and the Genes table--and then a fresh VAT would be created based on that.

Now that the usage has changed a bit, reworking the logic and failsafes here would all be helpful

fi

echo "Loading data into a pre-vat table ~{dataset_name}.~{genes_table}"
bq --location=US load --project_id=~{project_id} --source_format=NEWLINE_DELIMITED_JSON ~{dataset_name}.~{genes_table} ~{genes_path}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar question here with genes table

# note: tab delimiter and compression creates tsv.gz files
bq query --nouse_legacy_sql --project_id=~{project_id} \
'EXPORT DATA OPTIONS(
uri="~{export_path}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going straight to GCS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes---and we'll need to create a new step that merges them all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the specs have changed, so it looks like we wont need to loop over chrs either)

bq --location=US mk --project_id=~{project_id} ~{dataset_name}.~{vat_table} ~{nirvana_schema}
else
bq rm -t -f --project_id=~{project_id} ~{dataset_name}.~{vat_table}
bq --location=US mk --project_id=~{project_id} ~{dataset_name}.~{vat_table} ~{nirvana_schema}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mk line could be outside the if/else to avoid duplication

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes--I think this whole gate should be flipped actually

@codecov
Copy link

codecov bot commented Apr 26, 2022

Codecov Report

❗ No coverage uploaded for pull request base (ah_var_store@2381a09). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff                @@
##             ah_var_store     #7769   +/-   ##
================================================
  Coverage                ?   86.308%           
  Complexity              ?     35194           
================================================
  Files                   ?      2170           
  Lines                   ?    164837           
  Branches                ?     17775           
================================================
  Hits                    ?    142267           
  Misses                  ?     16248           
  Partials                ?      6322           

gcloud config set project ~{query_project_id}
gsutil cp ~{service_account_json_path} local.service_account.json
gcloud auth activate-service-account --key-file=local.service_account.json
gcloud config set project ~{query_project_id}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preexisting issue but it seems like we mix 2 and 4 space indents in the command blocks of the same WDLs

memory: "8 GB"
preemptible: 5
cpu: "1"
disks: "local-disk 250 SSD"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you need SSD here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No---I'd ideally like to parameterize a bunch of these values

@RoriCremer RoriCremer merged commit 75b5115 into ah_var_store Apr 27, 2022
@RoriCremer RoriCremer deleted the rc-vs-365-clinvar-class branch April 27, 2022 21:13
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants