Analysis pipeline for RNA-seq data

This pipeline is specifically designed for the analysis of RNA-seq data. It’s written in Snakemake. Upon execution, the pipeline will produce outputs detailing gene read counts, mutations, fusions, and chromosomal level copy number variations (gains/losses) derived from the RNA-seq data.

The workflow of this pipeline:

Dependencies

Please install all the needed softwares before running this pipeline:

Configuration

Users can edit the run_rnaseq.smk file for configurations.

Parameters:

‘ref_fa’, the fasta file of reference genome of human GRCh38. Users need to download it.

‘gtf’, gtf annotation file of the reference genome. Users need to download it.

‘bed_DUX4’, bed file of DUX4 genes. This file is used in the read counts patching process for DUX4 genes. Already included in the 0.ref directory.

‘ref_star’, the directory of reference used by STAR to do alignment. Users will get it after the installation of STAR.

‘ref_fusioncatcher’, the directory of reference used by FusionCatcher to call gene fusions. Users will get it after the installation of FusionCatcher.

‘ref_cicero’, the directory of reference used by Cicero to call gene fusions. Users will get it after the installation of Cicero.

‘ref_RNApeg_flat’, the refFlat file used by RNApeg. Already included in the 0.ref directory.

‘cores_star’, ‘cores_samtools’, ‘cores_fusioncatcher’, ‘cores_RNApeg’ and ‘cores_cicero’ are the number of threads used by the the corresponding software.

‘dir_in’, the directory of input fastq files. Only gz compressed paired-end fastq files are supported currently. The file names should follow the pattern {sample}.R1.fq.gz and {sample}.R2.fq.gz. If a sample id is COH000456_D1, then the fastq file names should be COH000456_D1.R1.fq.gz and COH000456_D1.R2.fq.gz.

‘dir_out’, the output directory. Results will be stored in sub-directories within this folder, each named according to the respective sample ID.

‘samplelist’, the sample ID list that will be processed for analysis. The corresponding fastq files need to be stored in the directory ‘dir_in’.

Run the pipeline

After installation of all the required softwares and propoably configured the parameters, a dry-run could be executed with:

snakemake -s run_rnaseq.smk -pn

To run the pipeline with 16 cores:

snakemake -s run_rnaseq.smk -p -j16

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
0.ref		0.ref
0.scripts		0.scripts
pics		pics
README.md		README.md
rnaseq.smk		rnaseq.smk
run_rnaseq.smk		run_rnaseq.smk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis pipeline for RNA-seq data

Dependencies

Please install all the needed softwares before running this pipeline:

Configuration

Run the pipeline

About

Releases

Packages

Languages

gu-lab20/RNAseq

Folders and files

Latest commit

History

Repository files navigation

Analysis pipeline for RNA-seq data

Dependencies

Please install all the needed softwares before running this pipeline:

Configuration

Run the pipeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages