Skip to content

dieterich-lab/nmd-wf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NMD workflow

License GitHub issues

This repository exists for reproducibility purpose. The data generated on this workflow powers the NMDtxDB. Raw data is available at the SRA PRJNA1054031. RNA-seq reads need to be pre-processed and alignment before input.

Workflow description

The workflow comprises two parts. The first part comprises a Snakemake workflow (workflow). The second part enables the CDS detection and integration.

Usage

Part 1

This refers to the workflow to generate the de novo transcriptome, and compute DGE and DTE.

snakemake --jobs 10 --cores 10 --profile slurm --printshellcmds --reason --use-singularity --use-conda --use-envmodule

To produce the DAG:

snakemake --rulegraph | dot -Tsvg > rulegraph.sv

Part 2

This refers to the workflow for CDS detection. Here an example using sequences trimmed by the Ensembl start codon:

awk '{ print $1 "\t" $7-1 "\t" $8 "\t" $4 "\t" 1 "\t" $6; }' GRCh38.102.gtf > ref_cds.bed

Rscript cds/StartATG_to_cDNA.R ref_cds.bed

perl longorf2_fwd_v2.pl --input GRCh38.102.fa --startcodon ref_cds_cDNA.bed > ensembl_longorf2.fa 

See longorf_integration_bed12 script, which details how the multiple source integration is done.

To retrieve the other sources:

wget https://ftp.ebi.ac.uk/pub/databases/gencode/riboseq_orfs/data/Ribo-seq_ORFs.bed
https://api.openprot.org/api/2.0/HS/downloads/human-openprot-2_0-refprots+altprots+isoforms-uniprot2017_03_07.bed.zip

License

This project is licensed under the MIT.

Funding

This work was supported by the DFG Research Infrastructure West German Genome Center, project 407493903, as part of the Next-Generation Sequencing Competence Network, project 423957469.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published