Skip to content

gu-lab20/RNAseq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analysis pipeline for RNA-seq data

This pipeline is specifically designed for the analysis of RNA-seq data. It’s written in Snakemake. Upon execution, the pipeline will produce outputs detailing gene read counts, mutations, fusions, and chromosomal level copy number variations (gains/losses) derived from the RNA-seq data.

The workflow of this pipeline:



Dependencies

Please install all the needed softwares before running this pipeline:

perl-v5.26.2

Snakemake-7.16.0

STAR-2.7.2b

samtools-1.12

GATK-4.3.0.0

HTSeq-count-0.13.5

FusionCatcher-1.33

RNApeg

Cicero-0.3.0p2

RNAseqCNV-1.2.2

Configuration

Users can edit the run_rnaseq.smk file for configurations.

Parameters:

ref_fa’, the fasta file of reference genome of human GRCh38. Users need to download it.

gtf’, gtf annotation file of the reference genome. Users need to download it.

bed_DUX4’, bed file of DUX4 genes. This file is used in the read counts patching process for DUX4 genes. Already included in the 0.ref directory.

ref_star’, the directory of reference used by STAR to do alignment. Users will get it after the installation of STAR.

ref_fusioncatcher’, the directory of reference used by FusionCatcher to call gene fusions. Users will get it after the installation of FusionCatcher.

ref_cicero’, the directory of reference used by Cicero to call gene fusions. Users will get it after the installation of Cicero.

ref_RNApeg_flat’, the refFlat file used by RNApeg. Already included in the 0.ref directory.

cores_star’, ‘cores_samtools’, ‘cores_fusioncatcher’, ‘cores_RNApeg’ and ‘cores_cicero’ are the number of threads used by the the corresponding software.

dir_in’, the directory of input fastq files. Only gz compressed paired-end fastq files are supported currently. The file names should follow the pattern {sample}.R1.fq.gz and {sample}.R2.fq.gz. If a sample id is COH000456_D1, then the fastq file names should be COH000456_D1.R1.fq.gz and COH000456_D1.R2.fq.gz.

dir_out’, the output directory. Results will be stored in sub-directories within this folder, each named according to the respective sample ID.

samplelist’, the sample ID list that will be processed for analysis. The corresponding fastq files need to be stored in the directory ‘dir_in’.

Run the pipeline

After installation of all the required softwares and propoably configured the parameters, a dry-run could be executed with:

snakemake -s run_rnaseq.smk -pn

To run the pipeline with 16 cores:

snakemake -s run_rnaseq.smk -p -j16

About

Analysis pipeline for RNA-seq data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages