Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KM GVS documentation #7903

Merged
merged 49 commits into from
Jun 29, 2022
Merged

KM GVS documentation #7903

merged 49 commits into from
Jun 29, 2022

Conversation

kayleemathews
Copy link

@kayleemathews kayleemathews commented Jun 15, 2022

@RoriCremer and I created these docs to provide information about the workflow to beta users and walk them through the steps of running the workflow in the beta workspace on example data as well as their own sample data.

| :----: | :---: | :----: | :--------------: |
| [GvsJointVariantCalling](https://github.com/broadinstitute/gatk/blob/rc-vs-483-beta-user-wdl/scripts/variantstore/wdl/GvsJointVariantCalling.wdl) | June, 2022 | [Kaylee Mathews](mailto:[email protected]) and [Aurora Cremer](mailto:[email protected]) | If you have questions or feedback, contact the [Broad Variants team](mailto:[email protected]) |

![Diagram depicting the Broad Genomic Variant Store workflow. Sample GVCF files are imported into the core data model. A filtering model is trained using Variant Quality Score Recalibration, or VQSR, and then used to extract cohorts and produce sharded joint VCF files. Each step integrates BigQuery and GATK tools.](/scripts/variantstore/genomic-variant-store_diagram.png)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a placeholder figure for now until we have time to create one with a little more detail.

@codecov
Copy link

codecov bot commented Jun 15, 2022

Codecov Report

❗ No coverage uploaded for pull request base (ah_var_store@586f3f7). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff                @@
##             ah_var_store     #7903   +/-   ##
================================================
  Coverage                ?   84.757%           
  Complexity              ?     34663           
================================================
  Files                   ?      2170           
  Lines                   ?    164888           
  Branches                ?     17786           
================================================
  Hits                    ?    139754           
  Misses                  ?     18943           
  Partials                ?      6191           

The [GvsCreateAltAllele subworkflow (alias = CreateAltAllele)](https://github.com/broadinstitute/gatk/blob/ah_var_store/scripts/variantstore/wdl/GvsCreateAltAllele.wdl) splits alternate alleles and calculates additional annotations to be used for filtering. GvsCreateAltAllele imports an additional workflow, [GvsUtils (alias = Utils)](https://github.com/broadinstitute/gatk/blob/ah_var_store/scripts/variantstore/wdl/GvsUtils.wdl).

#### B. GvsCreateFilterSet

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the output of this task? The next section talks about VCFs, but we've been working with GVCF up till this point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another table in the dataset -- not something the user needs to understand, but it's the output of this that we use to create the filter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the first part of the BQ only part of the workflow. We are taking data from one table (or several, if the number of samples is >4k) and moving it into another table (partially for data access patterns!!!)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified this section significantly so I don't think your question is answered in the doc @ekiernan. However, I hope by simplifying it, it makes it seem less like something users need to know the inner workings of?

Copy link
Contributor

@kcibul kcibul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice to see this being created!

At a high level, I think we can make it more concise and approachable for users. It's a bit wordy in places, and we also describe details that won't matter to users or they won't know what to make of it. We can try reading it with the eyes of a completely new person to Terra/GVS (who knows genomics/gvcfs/etc)

It might also be helpful to separate it out into 3 docs?

  • Overview of GVS - what it is, how the parts work at a high/conceptual level. Good start to this at the top
  • QuickStart
  • Running On your own samples

As a new user, it seems to bounce back and forth between all three of these and if I have one goal in mind I get a bit lost. E.g. if I want to just try it out on supplied data (quick start) I might get confused about all the talk with required annotations, scale limitations, etc. But if I'm moving on to using my own samples I def want to see that stuff as pre-requisites

scripts/variantstore/gvs-overview.md Outdated Show resolved Hide resolved
scripts/variantstore/gvs-overview.md Outdated Show resolved Hide resolved
scripts/variantstore/gvs-overview.md Outdated Show resolved Hide resolved
scripts/variantstore/gvs-overview.md Outdated Show resolved Hide resolved
scripts/variantstore/gvs-overview.md Outdated Show resolved Hide resolved
scripts/variantstore/gvs-overview.md Outdated Show resolved Hide resolved
scripts/variantstore/gvs-overview.md Outdated Show resolved Hide resolved
scripts/variantstore/gvs-overview.md Outdated Show resolved Hide resolved
scripts/variantstore/gvs-overview.md Outdated Show resolved Hide resolved
scripts/variantstore/gvs-overview.md Outdated Show resolved Hide resolved
@kayleemathews
Copy link
Author

@ekiernan and @kcibul thank you for your feedback!

@RoriCremer RoriCremer merged commit 9f857df into ah_var_store Jun 29, 2022
@RoriCremer RoriCremer deleted the km-gvs-docs branch June 29, 2022 20:24
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants