Skip to content

Snakemake workflow designed to perform RNASeq transcrpotime expression estimation with Salmon

License

Notifications You must be signed in to change notification settings

tdayris/fair_rnaseq_salmon_quant

Repository files navigation

Snakemake workflow used to estimate transcripts/genes abundance with Salmon.

Usage

The usage of this workflow is described in the Snakemake workflow catalog it is also available locally on a single page.

Results

A complete description of the results can be found here in workflow reports.

Material and Methods

The tools used in this pipeline are described here textually.

┌──────────────────────┐      ┌─────────────────────┐
│ fair_genome_indexer  │      │ fair_fastqc_multiqc │
└───────┬──────────────┘      └─────┬─────────┬─────┘
        │                           │         │      
        │                           │         │      
  ┌─────▼──────┐                 ┌──▼───┐     │      
  │salmon_index│   ┌─────────────┤fastp │     │      
  └─────┬──────┘   │             └──┬───┘     │      
        │          │                │         │      
        │          │                │         │      
 ┌──────▼───────┐  │                │         │      
 │ salmon_quant ◄──┘        ┌───────▼───────┐ │      
 └──────────┬───┘           │               │ │      
            │               │               │ │      
            └───────────────►    MultiQC    ◄─┘      
                            │               │        
                            └────────▲──────┘        
  ┌─────────┐                        │               
  │datavzrd ├────────────────────────┘               
  └─────────┘                                        

Index and genome sequences with fair_genome_indexer

Get DNA sequences

Step Commands
Download DNA Fasta from Ensembl ensembl-sequence
Remove non-canonical chromosomes pyfaidx
Index DNA sequence samtools
Creatse sequence Dictionary picard

Get genome annotation (GTF)

Step Commands
Download GTF annotation ensembl-annotation
Fix format errors Agat
Remove non-canonical chromosomes, based on above DNA Fasta Agat
Remove <NA> Transcript support levels Agat

Quality controls

Step Wrapper
FastQC fastqc-wrapper
MultiQC multiqc-wrapper

Read abundance estimation with salmon-tximport meta-wrapper

Indexation

Step Wraper
Create gentrome and decoy sequences generate-decoy
Index decoy aware gentrome salmon-index
┌───────────────────────────────────────┐   ┌─────────────────────────────────────────────┐
│Genome sequences  (fair_genome_indexer)│   │Transcriptome sequences (fair_genome_indexer)│
└────────────────┬──────────────────────┘   └─────────┬───────────────────────────────────┘
                 │                                    │                                    
                 │                                    │                                    
┌────────────────▼────────────────────────────────────▼─────┐                              
│Gentrome creation and decoy sequences identification (bash)│                              
└────────────────┬──────────────────────────────────────────┘                              
                 │                                                                         
                 │                                                                         
┌────────────────▼──────────────────┐                                                      
│Decoy aware gentrome index (Salmon)│                                                      
└───────────────────────────────────┘                                                      

Abundance estimation

Step Wrapper
Trimm raw reads fastp
Estimate abundances salmon-quand
Aggregate counts in TSV in-house script
Aggregate counts in R tximport
┌───────────────────┐                      ┌───────────────────────────────────┐
│fair_fastqc_multiqc│                      │Decoy aware gentrome index (Salmon)│
└────────┬──────────┘                      └───────┬───────────────────────────┘
         │                                         │                            
         │                                         │                            
┌────────▼──────┐                          ┌───────▼───────────────────────┐    
│Trimmin (fastp)├──────────────────────────►Abundance estimatation (Salmon)│    
└───────────────┘                          └───────┬───────────────────────┘    
                                                   │                            
         ┌─────────────────────────────────────────┤                            
         │                                         │                            
┌────────▼────────────────────────┐        ┌───────▼───────────────────────────┐
│Count aggregation in R (tximport)│        │Count aggregation in CSV (in_house)│
└─────────────────────────────────┘        └───────────────────────────────────┘