Quicksand-build

The quicksand helper-pipeline

See the Github Pages of quicksand for a comprehensive documentation of the pipeline.

Requirements
Quickstart
Parameters
quicksand

This repostory is an addition to the mpieva/quicksand pipeline see here. Starting quicksand-build will download the mitochondiral genomes from the current NCBI/RefSeq release and create - for the given taxa - the datastructure and files required by the quicksand pipeline.

Make sure to check the RefSeq Website and note down the current RefSeq Release that is used for your database

The output of the pipeline is structured as followes

    ncbi: 
         mitochondrion.{n}.genomic.gbff.gz - raw downloaded files from NCBI
    genomes: 
         genomes/{family}/{species}.fasta - The indexed mitochondrial genomes used for mapping with bwa
         genomes/taxid_map.tsv - A table with all nodes in the database - used to get all reference genomes for one taxon ID
    masked:
         masked/{species}.masked.bed - Bed files for all species in the database showing low-complexity regions
    kraken:
         kraken/Mito_db_kmer{kmersize} - A preindexed Kraken-database for the given kmers containing all the species in the database
    work: contains nextflow-specific files and can be deleted after the run

Requirements

To run the pipeline the following programms need to be installed:

Nextflow (tested on v.20.04.10): Installation
Singularity (tested on v3.7.1): Installation or Docker

Quickstart

To run the pipeline with default parameters open the terminal and type

nextflow run mpieva/quicksand-build -profile singularity

This will construct the kraken-database for kmer 22 from all mitochondrial genomes in the current refseq-release \

Parameters

The pipeline accepts the following parameters:

  Pipeline ARGS
       --outdir  PATH    : Directory to save the output in. Default = "out"
       --kmers   KMERS   : Comma-separated list of kmers for which databases are created (e.g. 21,22,23). Default=22
       --include STRING  : comma-separated string of Taxa that should be in the DB, e.g. "Mammalia". Default='root'
       --exclude STRING  : comma-separated string of Taxa that mustn't be in the DB, e.g. "Pan,Gorilla".

  Nextflow ARGS (only one dash!)
       -profile  PROFILE : Run the pipeline with the assigned profile (see profiles below)
       -resume           : Resume the previous run (if it was stopped in the mean time)
       -w        PATH    : Specify a different "work" directory for intermediate files
       -c        PATH    : Path to a nextflow.config file that provides ADDITIONAL parameters

quicksand

To integrate the created datastructure, run the quicksand pipeline with the following parameters:

    --genome <OUTDIR>/genomes
    --bedfiles <OUTDIR>/masked
    --db <OUTDIR>/kraken/Mito_db_kmer<KMER>/

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
assets		assets
bin		bin
envs		envs
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quicksand-build

Requirements

Quickstart

Parameters

quicksand

About

Releases

Packages

Languages

License

mpieva/quicksand-build

Folders and files

Latest commit

History

Repository files navigation

Quicksand-build

Requirements

Quickstart

Parameters

quicksand

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages