Skip to content

groups in slivar

Brent Pedersen edited this page Feb 25, 2019 · 6 revisions

motivation

groups allow a user to indicate aliases so that a single expression can be applied to many groups of samples.

examples

quartet

A simple example would be that we have 3 families, each with a mom, dad, proband, and unaffected sibling. Given sample ids of s1..s12 that appear in the vcf, we could create an alias file like:

#proband dad mom sibling
s1       s2  s3      s4
s5       s6  s7      s8
s9       s10 s11     s12

where the headers indicate the labels that we can use in a --group-expr. Then a --group-expr might look like:

--group-expr "denovo:mom.alts == 0 && dad.alts == 0 && sibling.alts == 0 \ # all unaffecteds are hom-ref
    && proband.alts == 1 \ # proband is heterozygous
    && mom.AD[1] == 0 && dad.AD[1] == 0 && sibling.AD[1] == 0 \ ## make sure no alternate alleles are seen in unaffecteds
    && kid.AB > 0.2 && kib.AB < 0.8 \ # make sure the allele balance is reasonable
    && INFO.gnomad_popmax_af < 1e-3 \ # variant must be rare in gnomad

This would add an INFO field of denovo=$proband to any variant that matches this criteria. The first column, in this case proband is used as the entry in the INFO field. Note that these labels are for human-readability, only, they can be whatever the user choose, for example, the above header could instead by: #affected mom dad unaffected if that makes the expressions more readable.

somatic variants

For somatic variants, the intuitive labels may be "tumor" and "normal", or for 4 patients, each with 3 tumor time-points, a file make look like:

#normal  tumor1   tumor2   tumor3
s1n      s1t1     s1t2     s1t3      
s2n      s2t1     s2t2     s2t3      
s3n      s3t1     s3t2     s3t3      
s4n      s4t1     s4t2     s4t3      

Then, to find somatic variants that increase in allele frequency across the tumor time-points, we can specify an expression like:

--group-expr "increasing:normal.alts == 0 && normal.AD[1] == 0 \ # no evidence in normal
       && tumor1.AB > 0 && tumor2.AB > tumor1.AB && tumor3.AB > tumor2.AB

this will create a new INFO field increasing and it will have the list of normal (first column) samples that met that criteria for each variant.

Clone this wiki locally