Skip to content

metalhelix/spikein_analysis

Repository files navigation

Spikein Analysis Script

A very basic script for analyzing ERCC controls in RNA-seq data

About

Input:

  • directory containing one or more fastq.gz files.
    • Tested with concatentated Illumina HiSeq output
  • SampleSheet.csv which describes the samples found in fastq.gz files
  • ERCC bowtie indexes built under a subdirectory called ‘data’
  • mix 1 and mix 2 expected values in a file called ‘expected.txt’ inside data

Output:

A sub-directory in the starting directory containing the following:

  • Alignments to the ERCC control sequences
  • Basic RPKM outputs for each sample
  • Graphs displaying dynamic range of samples compared to mix 1 and mix 2

To Use

  • git clone into directory of your choice
  • create ‘data’ subdirectory
  • here are the needed files inside data:
ERCC92.1.ebwt
ERCC92.2.ebwt
ERCC92.3.ebwt
ERCC92.4.ebwt
ERCC92.fa
ERCC92.fa.fai
expected.txt

expected.txt has the following format:

Re-sort ID  ERCC ID subgroup  concentration in Mix 1 (atmol/ul) concentration in Mix 2 (atmol/ul) expected fold-change rati log2(fold-change) expected dCt

run ./spikein_run.rb /path/to/fastq.gz/files

SampleReport.csv has the following fields:

output,lane,sample name,illumina index,custom barcode,read,reference,total reads,pass filter reads,pass filter percent,align percent,type,read length

Only output, lane, sample name, and illumina index should be necessary.

output should be the name of the fastq file, sample name along with lane is used to name the spikein analysis output.

Example:

output,lane,sample name,illumina index,custom barcode,read,reference,total reads,pass filter reads,pass filter percent,align percent,type,read length
test.fastq,1,Illumina_1ug_1,ACAGTG,,1,mm9,38088422,38088422,100.00,69.60,single,51

Enjoy!

Dependencies

Only tested on Mac 10.7 and Linux – CentOS6.

  • ruby 1.9
  • R
  • samtools
  • bowtie
  • zcat

About

Basic ERCC spikein analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published