This pipeline is specifically designed for the analysis of RNA-seq data. It’s written in Snakemake. Upon execution, the pipeline will produce outputs detailing gene read counts, mutations, fusions, and chromosomal level copy number variations (gains/losses) derived from the RNA-seq data.
The workflow of this pipeline:
Users can edit the run_rnaseq.smk file for configurations.
Parameters:
‘ref_fa’, the fasta file of reference genome of human GRCh38. Users need to download it.
‘gtf’, gtf annotation file of the reference genome. Users need to download it.
‘bed_DUX4’, bed file of DUX4 genes. This file is used in the read counts patching process for DUX4 genes. Already included in the 0.ref directory.
‘ref_star’, the directory of reference used by STAR to do alignment. Users will get it after the installation of STAR.
‘ref_fusioncatcher’, the directory of reference used by FusionCatcher to call gene fusions. Users will get it after the installation of FusionCatcher.
‘ref_cicero’, the directory of reference used by Cicero to call gene fusions. Users will get it after the installation of Cicero.
‘ref_RNApeg_flat’, the refFlat file used by RNApeg. Already included in the 0.ref directory.
‘cores_star’, ‘cores_samtools’, ‘cores_fusioncatcher’, ‘cores_RNApeg’ and ‘cores_cicero’ are the number of threads used by the the corresponding software.
‘dir_in’, the directory of input fastq files. Only gz compressed paired-end fastq files are supported currently. The file names should follow the pattern {sample}.R1.fq.gz and {sample}.R2.fq.gz. If a sample id is COH000456_D1, then the fastq file names should be COH000456_D1.R1.fq.gz and COH000456_D1.R2.fq.gz.
‘dir_out’, the output directory. Results will be stored in sub-directories within this folder, each named according to the respective sample ID.
‘samplelist’, the sample ID list that will be processed for analysis. The corresponding fastq files need to be stored in the directory ‘dir_in’.
After installation of all the required softwares and propoably configured the parameters, a dry-run could be executed with:
snakemake -s run_rnaseq.smk -pn
To run the pipeline with 16 cores:
snakemake -s run_rnaseq.smk -p -j16