Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run GIAB comparisons #7237

Merged
merged 6 commits into from
May 4, 2021
Merged

How to run GIAB comparisons #7237

merged 6 commits into from
May 4, 2021

Conversation

kcibul
Copy link
Contributor

@kcibul kcibul commented Apr 29, 2021

Addresses

https://github.com/broadinstitute/dsp-spec-ops/issues/280

Analysis has been done and delivered, this is primarily documentation of how to do it in the future

If someone wants to test-drive the instructions, there is a GVS VCF at

gs://broad-dsp-spec-ops/scratch/bigquery-jointcalling/comparison-v3/gvs.chr20.vcf.gz*

Copy link
Contributor

@ahaessly ahaessly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

continue

if (len(sys.argv) > 2 and sys.argv[2] == "loose"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see this option in the readme. When should this be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documented

Copy link
Member

@mmorgantaylor mmorgantaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great to me on a read-through!

# extract each of the samples
INPUT_VCF=gvs.chr20.vcf.gz

gatk SelectVariants -V ${INPUT_VCF} --sample-name SM-G947Y --select-type-to-exclude NO_VARIATION -O NA12878.gvs.chr20.vcf.gz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be minor, but can we document explicitly what --select-type-to-exclude NO_VARIATION does? i'm guessing it means only return sites at which this sample has a variant, i.e. exclude ref sites?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


## script to add "AS_MAX_VQSLOD" to VCFs
```
for sample in NA12878 SYNDIP BI_HG002 BI_HG003 UW_HG002
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow 😍 for this block

tabix warp_tieout_acmg_cohort_v1.chr20.vcf.gz

INPUT_VCF=warp_tieout_acmg_cohort_v1.chr20.vcf.gz
SOURCE=warp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be SOURCE="warp" ? and does INPUT_VCF assignment also need quotes? or is this one of those bash things that doesn't matter if there's no spaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right -- only needed for quoting special characters, but I changed it just to be consistent

@@ -1,5 +1,5 @@
PROJECT="spec-ops-aou"
DATASET="gvs_tieout_acmg_v1"
DATASET="gvs_tieout_acmg_v2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i KNEW we'd need a v2!!!!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always...

@kcibul kcibul merged commit 7d5aec7 into ah_var_store May 4, 2021
@kcibul kcibul deleted the kc_giab branch May 4, 2021 18:25
- bcftools
- tabix
- python 3.7+

Copy link
Contributor

@RoriCremer RoriCremer May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my immediate reaction was that I should pip install them, but that's clearly wrong.
I know I'm a noob with this project, but a little more context on the prereqs would be immensely helfpul and would have saved me a lot of time googling

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add samtools?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conda create --name gvs python=3.8
conda activate gvs
conda install -c bioconda samtools=1.9 --force-reinstall
conda install -c bioconda bcftools
conda install -c bioconda rtg-tools

Copy link
Contributor

@RoriCremer RoriCremer May 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add GATK to prereqs -- should this be done through conda tho? isn't gatk technically part of this repository anyway?


## Obtain Truth sample VCFs

First, create a full cohort extract (as described in README.md) using the desired filter_set_name. Assuming this is in a single gathered VCF of `gvs.vcf.gz`
Copy link
Contributor

@RoriCremer RoriCremer May 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be more like "once you have a full cohort extract you want to compare" ?

${BASE_CMD} -b truth/CHM.full.38.vcf.gz -e truth/CHM.gvs.evaluation.bed -c SYNDIP.${SOURCE}.chr20.maxas.vcf.gz -o syndip_${SOURCE}${SUFFIX}
```

The do the same thing but use all records
Copy link
Contributor

@RoriCremer RoriCremer May 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"then" ?

This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants