Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VS-280 Create a VAT intermediary #7657

Merged
merged 20 commits into from
Feb 25, 2022
Merged

VS-280 Create a VAT intermediary #7657

merged 20 commits into from
Feb 25, 2022

Conversation

RoriCremer
Copy link
Contributor

@RoriCremer RoriCremer commented Jan 31, 2022

Add an extensive and instructional ReadMe

Move the expensive step and the saving data step into a subworkflow to that they can complete their mission together in harmony even when a fellow shard has failed.

@RoriCremer RoriCremer force-pushed the rc-vat-intermediary branch 2 times, most recently from 91a0ecc to 7050d09 Compare January 31, 2022 20:38
@RoriCremer RoriCremer marked this pull request as ready for review February 10, 2022 23:22
@rsasch rsasch self-requested a review February 11, 2022 15:21
Copy link

@rsasch rsasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies — a lot of formatting/English and "please create a ticket" comments.

scripts/variantstore/variant_annotations_table/ReadMe.md Outdated Show resolved Hide resolved
scripts/variantstore/variant_annotations_table/ReadMe.md Outdated Show resolved Hide resolved
scripts/variantstore/variant_annotations_table/ReadMe.md Outdated Show resolved Hide resolved
scripts/variantstore/variant_annotations_table/ReadMe.md Outdated Show resolved Hide resolved
scripts/variantstore/variant_annotations_table/ReadMe.md Outdated Show resolved Hide resolved
scripts/variantstore/wdl/GvsCreateVATAnnotations.wdl Outdated Show resolved Hide resolved
scripts/variantstore/wdl/GvsCreateVATAnnotations.wdl Outdated Show resolved Hide resolved
scripts/variantstore/wdl/GvsCreateVATAnnotations.wdl Outdated Show resolved Hide resolved
scripts/variantstore/wdl/GvsCreateVATAnnotations.wdl Outdated Show resolved Hide resolved
Copy link

@rsasch rsasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more changes requested for clarity. Also, it didn't look like any of my comments for the "Notes" section or the WDL files went through?

@RoriCremer RoriCremer force-pushed the rc-vat-intermediary branch 2 times, most recently from 2282d75 to ad6455f Compare February 18, 2022 17:29
Copy link

@rsasch rsasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the changes! Only one more thing — I don't think the links to other files (the WDLs and the example inputs json) work. I might have steered you wrong on that one.

Comment on lines 39 to 40
| workspace_namespace | name of the current workspace namespace | ## is this still needed?
| workspace_name | name of the current workspace | ## is this still needed?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see either of these inputs in GvsAssignIds, so feel free to remove them.

The first two of these inputs are two files — one of the file/vcf/shards you want to use for the VAT, and their corresponding index files. These are labelled as `inputFileofFileNames` and `inputFileofIndexFileNames` and need to be copied into a GCP bucket that this pipeline will have access to (eg. this bucket: `gs://aou-genomics-curation-prod-processing/vat/`) for easy access during the workflow.
The third input is the ancestry file from the ancestry pipeline which will be used to calculate AC, AN and AF for all subpopulations. It needs to be copied into a GCP bucket that this pipeline will have access to. This input has been labelled as the `ancestry_file`.

Most of the other files are specific to where the VAT will live, like the project_id and dataset_name and the table_suffix which will name the VAT itself as vat_`table_suffix` as well as a GCP bucket location, the output_path, for the intermediary files and the VAT export in tsv form.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the export for the VAT was CSV not TSV?


These numbers are cumulative. Also the names of these json files are retained from the original shard names so as to not cause collisions. If you run the same shards through the VAT twice, the second runs should overwrite the first and the total number of jsons should not change.
Once the shards have make it into the /genes/ and /vt/ directories, the majority of the expense and transformations needed for that shard are complete.
They are ready to be loaded into BQ. You will notice that past this step, all there is to do is create the BQ tables, load the BQ tables, run a join query and then the remaining steps are all validations or an export into tsv.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CSV?

@RoriCremer RoriCremer merged commit 3aa3c3b into ah_var_store Feb 25, 2022
@RoriCremer RoriCremer deleted the rc-vat-intermediary branch February 25, 2022 20:33
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants