Skip to content

Work at NIST: snakemake workflow to get short tandem repeats from WGS GIAB samples. R script for downstream processing and concordance

Notifications You must be signed in to change notification settings

SammedMandape/NIST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Bioinformatics pipeline

The snakemake workflow described here generates sequences (for both haplotypes) given a bed file of loci and a VCF file. Optionally, you can also convert the multiline fasta into one line. The image shows the entire bioinformatics pipeline with applications using targeted forensic markers.

The particular application this workflow was used included a VCF file from de novo assembly generated by GIAB team at NIST. However, the snakemake workflow can be broadly applied to any VCF file and loci of interest.

Pipeline Image

snakemake -s vcf2seq_v2.smk -c32

For dry run, use

snakemake -nps vcf2seq_v2.smk -c32

To look at the summary of the snakemake outputs, use

snakemake -s vc2seq_v2 -c32 --summary

About

Work at NIST: snakemake workflow to get short tandem repeats from WGS GIAB samples. R script for downstream processing and concordance

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published