Skip to content

rawgene/cwl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A repo for tools and workflows used by RAWG

Build Status

This repo contains the RNASeq analysis tools we pakcaged in cwl script. Below is a list of all the tools we currently have and the input and output names for cwl scripts.

Mini documentation for each tools

The best effort has made to standardise input and output names but there is still plenty room for improvement. Suggestions and comments are very welcome!

Specify the inputs and outputs for each cwl script

  • ballgown

    inputs:

    input_script: R ballgown script.
    tablemaker_output: directory of the tablemaker output
    metadata: metadata csv file.
    condition: string giving the condition of interest in the metadata columns

    outputs:

    DGE_res.csv
    DTE_res.csv

  • cuffdiff

    inputs:

    output: name for output directory
    threads: number of cores to use
    label: the factors of the condition of interest
    FDR: the FDR rate to which to label as significant, could default to 0.05.
    merged_gtf: merged gtf annotation file in gtf format
    condition1_files: cxb files of all samples for condition 1. cuffquant output
    condition2_files: cxb files of all samples for condition 2. cuffquant output

    outputs:

    output: names after inputs.output. directory containing all cuffdiff output files

    can add gene_exp.diff output only.

  • cufflinks

    inputs:

    gtf: annotation file in gtf format
    output: name for output directory
    threads: number of threads to use.
    bam: bam file

    outputs:

    output: named after output_dir. directory with all cufflink file outputs
    transcripts.gtf

  • cuffmerge

    inputs:

    output: string stating the output directory name
    gtf: annotation file in gtf format
    threads: number of threads to use.
    fasta: fasta file used in indexing
    cufflinks_output: transcripts.gtf files generated by cufflinks

    outputs:

    output: output directory named after input.output
    merged.gtf

  • cuffnorm

    inputs: output: name for output directory
    threads: number of threads to use.
    merged_gtf: merged gtf annotation file in gtf format
    condition1_files: cxb files of all samples for condition 1. cuffquant output
    condition2_files: cxb files of all samples for condition 2. cuffquant output

    outputs:

    output: named after inputs.output. file contained normalised gene count matrix

  • cuffquant

    inputs:

    output: name for output directory
    threads: number of threads to use.
    merged_gtf: merged gtf annotation file in gtf format
    bam: bam file

    outputs:

    output: named after inputs.output. Directory containing all quant files
    abundances.cxb

  • DESeq2

    inputs: input_script: R DESeq2 script.
    count_matrix: gene count matrix file
    metadata: metadata csv file.

    outputs:

    DGE_results.csv

  • DEXSeq

    inputs:

    input_script: R DEXSeq script
    count_matrix: exon count matrix file
    gff: directory containing gff file from htseq prepare
    metadata: metadata csv file.
    threads: number of threads to use

    outputs:

    DEE_results.csv

  • edger

    inputs:

    input_script: R edger script
    condition: string giving the condition of interest in the metadata columns
    count_matrix: gene counts matrix
    metadata: metadata csv file.

    outputs:

    DGE_res.csv

  • featurecounts

    inputs:

    input_script: R featurecounts script
    bam_files: all bam files
    gtf: annotation in ftd format
    threads: number of threads to use.
    metadata: used to see the libType of each sample.

    outputs: gene_count_matrix.csv

  • fgsea

    inputs:

    input_script: R fgsea input_script
    de_results: DGE res file
    gene_set: file containing gene set information in long form.

    ouputs: gsea_res.csv

  • hisat2_align

    inputs:

    threads: number of threads
    index_directory: path to hisat2 index
    first_pair: first fastq file
    second_pair: second fastq file
    output: output name
    XSTag: Tag to use ("--dta" or "--dta-cufflinks")

    outputs:

    hisat2_align_out: everything sam_output: sam files

  • hisat2_build

    inputs:

    fasta: fasta file
    threads: number of threads to use.
    output: name to use for output

    outputs:

    output: samfile with a basename given from inputs.output
    log

  • htseq_count

    inputs:

    input_script: python htseq_count script
    pairedend: logical
    stranded: logical
    input_format: bam or sam
    sorted_by: pos
    gff: gff file from htseq_annotation
    bam: bam or sam file
    outname: name for outfile

    outputs:

    outname: named after input.ouput

  • htseq_prepare

    inputs:

    input_script: python input script
    gtf: gtf annotation file
    gff_name: name for gff output

    outputs:

    gff_name: gff file named after inputs.gff_name

  • hypergeo

    inputs:

    input_script: R HyperGeo_Script script
    de_res: DGE res file
    gene_set: file containing gene set information in long form.

    outputs:

    hypergeo_res.csv

  • miso_index

    inputs:

    gtf: gtf file
    output: output name for output directory

    outputs:

    index_dir: output directory named after inputs.index_dir

  • miso_run

    inputs:

    threads: number of threads to use
    lib_type: pairedend or not

    change to pairedend or change all other to lib_type

    index_directory: Directory from miso_index output
    bam: bam file
    read_len: read length
    gtf: gtf file
    min_exon_size: minimum exon size to use
    output: output name

    outputs:

    out_dir: directory named after inputs.out_dir

  • prepDE

    inputs:

    input_script: python prepDE script
    stringtie_out: gtf files from stringtie

    outputs:

    gene_count_matrix.csv transcript_count_matrix.csv

  • salmon_count

    inputs:

    input_script: R count script
    gtf: annotation file in gtf format
    metadata: metadata.csv
    quant_results: directory with subdirectorys of outputs from salmon_quant

    outputs:

    gene_abundance_matrix.csv
    gene_count_matrix.csv * this one used for DGE analysis e.g. DESeq2
    gene_length_matrix.csv

  • salmon_index

    inputs:

    fasta: fasta files
    index_type: 2 different type of running "fmd" or "quasi"
    threads: number of threads to use
    output: directory output name

    outputs:

    index_name: output directory with name inputs.index_name

  • salmon_quant

    inputs:

    index_directory: directory of salmon index
    threads: number of threads to use
    output: output name
    first_end_fastq: if paired end. first pair fastq file
    second_end_fastq if paired end. second pair fastq file
    single_fastq: if single end. single fastq file

    outputs:

    out_dir: Directory with name of inputs.out_dir

  • samtools

    inputs:

    action: samtools command (default in sort)
    sortby: how to sort the bam file
    threads: how many threads to use (total - 1), it is additional
    samfile: sam file
    outfilename: ouutput file name

    outputs:

    outfilename: bam file with name inputs.outfilename

  • STAR_index

    inputs:

    threads: number of threads
    Mode: how to run STAR (use genomeGenerate)
    output: directory name to use
    fasta: fasta files
    gtf: gtf file

    outputs:

    index_out: index files

  • STAR_readmap

    inputs:

    threads: number of threads to use
    genomeDir: directory name to use
    readFilesIn: fastq files
    outFileNamePrefix: output name for directory

    outputs:

    star_read_out: every output file sam_output: sam file

  • stringtie

    inputs:

    bam: bam file
    threads: number of threads to use.
    gtf: gtf file
    output: output name for file

    outputs:

    outfilename: output file name with name inputs.outfilename

  • tablemaker

    inputs:

    threads: number of threads to use.
    merged_gtf: merged gtf from cuffmerge
    bam: bam files to use
    output: output name for directory.

    outputs:

    output: Directory with name inputs.outputs