Pipette: A framework for Bioinformatic pipelines

Pipette is a framework to quickly and easily build a pipeline for multi-step processing of input. It is especially suited for Bioinformatic applications.

Pipette also contains a collection of pipelines created in Pipette. Current Pipette Pipelines:

variant: a variant calling and annotation pipeline.

How it works

Pipette pipelines are made of steps. Each step is independent of other steps. Each step defines 2 attributes for itself:

input: The input parameters the step needs to run.
output: The parameters the step will generate after running.

Pipette chains these steps together, passing in input parameters and collecting output parameters automatically.

To create a new Pipeline in Pipette, simply sub-class the Pipeline class.

Tools

Pipette also includes a number of tools that perform the bulk of the processing for each step. These tools are simple wrappers for existing bioinformatic programs and also custom processing.

Requirements

Pipette is written in Ruby. Also, various tools are expected to be installed if used. See below for required tools for each pipeline.

Pipette Pipelines:

Variant calling and annotation pipeline

Pipeline to facilitate the automation of finding variants (SNPs/Indels) in an aligned BAM file.

Prerequisites

Currently, this pipeline does not perform the initial alignment step. This pipeline starts with an indexed BAM file and a reference genome.

Requirements

This pipeline wraps the calling of a number of external tools to perform the variant calling and annotation.

Current Applications Expected by vp:

GATK – performs most of the SNP / Indel calling steps using the GATK’s Unified Genotyper
samtools – for indexing output from GATK
snpEff – for annotation

Configuration Options

./variant_pipeline.rb -h
Usage: variant_pipeline [options]
    -i, --input BAM_FILE             REQUIRED - Input BAM file to call SNPs on
    -r, --reference FA_FILE          REQUIRED - Reference Fasta file for genome
    -o, --output PREFIX              Output prefix to use for generated files
    -j, --cores NUM                  Specify number of cores to run GATK on. Default: 4
    -c, --recalibrate COVARIATE_FILE If provided, recalibration will occur using input covariate file. Default: recalibration not performed
    -a, --annotate GENOME            Annotate the SNPs and Indels using Ensembl based on input GENOME. Example Genome: FruitFly
        --gatk JAR_FILE              Specify GATK installation
        --snpeff JAR_FILE            Specify snppEff Jar location
        --snpeff_config CONFIG_FILE  Specify snppEff config file location
        --samtools BIN_PATH          Specify location of samtools
    -q, --quiet                      Turn off some output
    -s realign,recalibrate,call,filter,annotate,
        --steps                      Specify only which steps of the pipeline should be executed
    -y, --yaml YAML_FILE             Yaml configuration file that can be used to load options. Command line options will trump yaml options
    -h, --help                       Displays help screen, then exits

Run Example

The easiest way to run the variant pipeline is to create a config yaml file to store most of the required configurations and then run using the -y flag.

sample_config.yml

reference: "~/genomes/Drosophila_melanogaster.BDGP5.4.54.dna.fasta"
input: "./aligned.fly.bam"
output: aligned.fly 
annotate: dm5.34
gatk: "~/tools/GATK/GenomeAnalysisTK.jar"
snpeff: "~/tools/snpEff/snpEff.jar"
snpeff_config: "~/tools/snpEff/snpEff.config"

Then simply run variant_pipeline with the -y flag:

./variant_pipeline/pipette.rb variant -y ./sample_config.yml

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
bin		bin
bulk		bulk
config		config
data		data
example		example
lib		lib
spec		spec
test		test
.autotest		.autotest
.gitignore		.gitignore
.rspec		.rspec
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.textile		README.textile
pe_bwa.rb		pe_bwa.rb
pipette.rb		pipette.rb
rna_seq.rb		rna_seq.rb
vp.rb		vp.rb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pipette: A framework for Bioinformatic pipelines

How it works

Tools

Requirements

Pipette Pipelines:

Variant calling and annotation pipeline

Prerequisites

Requirements

Current Applications Expected by vp:

Configuration Options

Run Example

sample_config.yml

About

Releases

Packages

Languages

metalhelix/pipette

Folders and files

Latest commit

History

Repository files navigation

Pipette: A framework for Bioinformatic pipelines

How it works

Tools

Requirements

Pipette Pipelines:

Variant calling and annotation pipeline

Prerequisites

Requirements

Current Applications Expected by vp:

Configuration Options

Run Example

sample_config.yml

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages