Material and methods

Genome DNA sequence and annotations were download from Ensembl. Pyfaidx [1] was used to filter non-cannonical chromosomes. Agat [2] was used to correct common issues found in Ensembl genome annotation files, filter non- cannonical chromosomes, and remove transcripts with TSL being equal to NA. Samtools [3] and Picard [4] were used to index genome sequences.

Raw fastq file quality was assessed with FastQC [5]. Raw fastq files were trimmed using Fastp [6] . Cleaned reads were aligned over indexed Ensembl genome with Bowtie2 [7]. Sambamba [8] was used to sort, filter, mark duplicates, and compress aligned reads. Quality controls were done on cleaned, sorted, deduplicated aligned reads using Picard [4] and Samtools [3]. Additonal quality assessments are done with RSeQC [9], NGSderive [10], GOleft [11], Mosdepth [12]. Quality repord produced during both trimming and mapping steps have been aggregated with MultiQC [13].

On user demand, alignment sieve are produced using Deeptools [14].

The whole pipeline was powered by Snakemake [15]. This pipeline is freely available on Github, details about installation usage, and resutls can be found on the Snakemake workflow page.

Authors:Thibault Dayris
Version:Unchanged since 4.0.0 of 07/16/2024