Skip to content

A pipeline to indentify taxon-specific k-mers in plant genomes

Notifications You must be signed in to change notification settings

bioinfo-ut/PlantTaxSeeker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IDENTIFICATION OF PLANT TAXON SPECIFIC K-MERS AND COUNTING K-MERS FROM METAGENOMIC WGS READS

Scripts for identification taxon-specific k-mers from plant genomes and for the detection and counting the k-mers directly from WGS reads of metagenomic sample.

PlantTaxSeeker scripts are licensed under the GPLv3 license.

The scripts consists predominantly of code written in Python (tested in UNIX server with Python versions 2.7 and 3.3) and also use:
glistmaker, glistcompare, glistquery, MakeUnion.pl and gmer_counter from the GenomeTester4 package

Usage

1. To identify target taxon specific k-mers, use command:

python identification_of_taxon_specific_kmers.py <Targets.fasta> <Nontargets.fasta> [optional_arguments]

The optional arguments can also be specified:

  • -w Length of the k-mer (default value 32)
  • -f The minimum number of target sequences that should contain every specific k-mers (default value 1)

Input files:

  • Target taxon genome sequences as FASTA format file
  • Nontarget taxa genome sequences as FASTA format file

Output files:

  • The list of target taxon specific k-mers (the count of k-mers and sequences) as binary file
  • The list of target taxon specific k-mers (the count of k-mers and sequences) as TEXT file

2. To filter out additional non-specific k-mers using whole genome sequencing raw reads or assembled sequences of nontarget taxa, use command.

python filtering_with_nontargets.py <Specific_kmers.list> <Nontarget1.fastq> [Nontargets fastqs] [optional_arguments]

The optional arguments can also be specified:

  • -w Length of the k-mer (bases, by default 32)
  • -f The k-mer frequency cutoff (only k-mers from nontarget sequences with at least given frequency cutoff will be filtered out from target k-mer list) (by default 10)

Input files:

  • Unfiltered target taxon specific k-mers list as binary file (the output file of identification_of_taxon_specific_kmers.py)
  • Nontarget taxon fastq files for filtering nonspecific k-mers

Output files:

  • Target taxon specific k-mers list as binary file (contains only k-mers that are not in nontarget taxa fastq files)
  • Target taxon specific k-mers list as TXT file

An example: the identification of Solanum lycopersicum specific k-mers:

README file for executing scripts for the identification Solanum lycopersicum specific k-mers are available in Github

3. To detect and count plant taxa specific k-mers from whole genome sequencing raw reads of metagenomic sample, use command.

python plant_taxa_kmers_counter.py <Specific_kmers.list> <Metagenomic_sample.fastq> [optional_argument]

The optional argument can also be specified:

  • -f The k-mer frequency cutoff (only k-mers with at least given frequency cutoff will be counted from metagenomic sequencing reads) (by default 1)

Input files:

  • Target taxon specific k-mers list as TXT file (the output file of identification_of_taxon_specific_kmers.py)
  • fastq file of WGS reads from metagenomic sample

Output

  • The count of detected target plant taxon specific k-mers in WGS reads from metagenomic sample.

An example: the identification of Lupinus spp. specific k-mers and counting Lupinus spp. specific k-mers from WGS reads of lupin-containing cookie:

README file for executing scripts for the identification Lupinus spp. specific k-mers and for counting of Lupinus spp. specific k-mers from cookie WGS data are available in Github

About

A pipeline to indentify taxon-specific k-mers in plant genomes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published