Skip to content

Script to clean Illumina pair-end sequences produced with the Nextera kit. Bases below Q30, Ns, and Nextera adapters are removed. Bases can also be removed at the beginning and end of each sequence. At the end, clean files can be analyzed with FastQC.

License

Notifications You must be signed in to change notification settings

GenomicaMicrob/nextera_cleaner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nextera_cleaner

Bash script to clean Illumina pair-end sequences produced with the Nextera kit.

This script will process both pair-end sequences, asks for a common name for the resulting sequences, trims bases below a Phred score value, removes N's and sequences below 20 bases. It can also deletes bases from the beggining of the sequences and also trim the sequences to a certain length by removing bases from the 3' end of the sequence. If files are compressed (.gz) it will automatically decompress them. After this, it will ask whether you want to merge the pair-end sequences (with flash), convert them to fasta, and run FastQC on resulting files.

INSTALLATION

  1. Download the latest release to any directory in your system.
  2. Decompress tar xzf nextera_cleaner.v0.1.0.tar.gz
  3. Make it executable: chmod +x nextera_cleaner.v0.1.0.sh

Be sure to keep the nextera_adapter.tsv and the contaminants.tsv files in the same folder as the script; this files are desirable for FastQC.

You can then create a symbolic link to the script so you call it from any directory.

USAGE

$ nextera_cleaner.v0.1.0.sh file_R1.fastq file_R2.fastq

Where file_R1.fastq file_R2.fastq are the files provided by the Illumina sequencer.

The script will ask if you want to trim some bases at the beginning of the sequences and also at the end. In order to give an appropriate number in both cases, it is recommended first to run FastQC with the raw secuences (file_R1.fastq file_R2.fastq), check the output and then decide if you need to trim.

DEPENDENCIES

You need the following programs in your PATH:

-Cutadapt

And if you want to merge the sequences:

-flash

Finally, FastQC is optional

-FastQC

About

Script to clean Illumina pair-end sequences produced with the Nextera kit. Bases below Q30, Ns, and Nextera adapters are removed. Bases can also be removed at the beginning and end of each sequence. At the end, clean files can be analyzed with FastQC.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages