Skip to content

Building reference panel

jinghuazhao edited this page Jun 3, 2018 · 1 revision

The weights have to be generated in general. The software TWAS contains two command files:

  • TWAS_get_weights.sh, which obtains weights (.ld, .cor, .map) from PLINK map/ped pair given a particular locus. It actually wraps up a program in R.

  • TWAS.sh, which conducts imputatation as reported in the Gusev et al. (2016).

Minor changes to the scripts may be required for your own data. The tasks involved are to

  • extract SNPs in a gene from 1000Genomes imputed data into PLINK map/ped files

  • obtain .ld, .cor and .map with TWAS_get_weights.sh for that gene

  • select summary statistics (.zscore) for the gene

  • conduct imputation with TWAS.sh into file .imp

  • repeat above steps for all genes and collect results

From UCSC, you obtain the gene bounaries as follows,

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select * from refGene' > refGene.txt

However, it is often necessary to define a region using a list of SNPs. In this regard, tables such as snp146 in hg19 above are needed. From locuszoom-1.3 (Pruim, et al. 2010) we can extract refFlat.txt and snp_pos.txt (see lz.sql) to build a list of SNP-gene pairs, as with (UK BioBank Axiom chip) Axiom_UKB_WCSG.na34.annot.csv.zip. Their chromosome-specific counterparts as with SNPs under all genes can also be derived. A Stata program lz.do which calls refGene.do is developed in collaboration with Dr Jian'an Luan to faciliate handling of gene boundaries.

Reference

Pruim RJ, et al. (2010). LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics, 26,2336-2337

Clone this wiki locally