Skip to content
This repository has been archived by the owner on Dec 29, 2023. It is now read-only.
/ mitofree Public archive

Automated mitochondrial genome assembly using SRA public data

Notifications You must be signed in to change notification settings

gavieira/mitofree

Repository files navigation

Mitofree

THIS REPO HAS BEEN ARCHIVED

You can use the Zenodo DOI to cite this code:

DOI

Docker image available here


A pipeline for automated mitochondrial genome assembly using public data.

Dependencies:

All of the above dependencies can be easily installed through Bioconda. However, due to dependency conflicts (between Python 2 and 3, for instance), manual creation of a conda environment for Mitofree can be a little tricky. Thus, we encourage using the docker image to run this software. Even though this README has been written to be as accessible as possible, it is highly recommended to learn a bit about docker if you're not familiar with it.

Running Mitofree with docker:

1 - Install docker

2 - Download the latest image:

docker pull gavieira/mitofree:latest

3- Use the following command to generate a container:

docker run --name mitofree -i -t -v ~:/mnt -w /mnt gavieira/mitofree /bin/bash

OBS: The contiainer has been created. Thus, the next time you need to run Mitofree, you can skip previous steps by simply starting the container and going straight to step 4. To start the container, run:

docker start -i mitofree

4- Finally, run Mitofree:

Basic usage:

nohup mitofree.py dataset_list.txt >mitofree.out 2>mitofree.err &

Mitofree's help message:

usage: mitofree.py [-h] [-S] [-M] [--novop_kmer] [--mitob_kmer] [-g] [-s] [-T]
                   FILENAME

Downloads sra NGS data and assembles mitochondrial contigs using NOVOPlasty
and MITObim

positional arguments:
  FILENAME           Path to file with multiple accessions (one per line)

optional arguments:
  -h, --help         show this help message and exit
  -S, --savespace    Automatically removes residual assembly files such as
                     fastq and mitobim iterations
  -M , --maxmemory   Limit of RAM usage for NOVOPlasty. Default: no limit
  --novop_kmer       K-mer used in NOVOPlasty assembly. Default: 39
  --mitob_kmer       K-mer used in MITObim assembly. Default: 73
  -g , --gencode     Genetic code table. Default: 2 (Vertebrate Mitochondrial)
  -s , --subset      Max number of reads used in the assembly process.
                     Default: 50 million reads
  -T , --timeout     Custom timeout for MITObim, in hours. Default: 24h

Please note the -M "--maxmemory" argument, that limits NOVOPlasty's RAM usage (in GB). If you are running this software from a machine with limited RAM available, you will want to set this option so that it won't use all your memory. For instance, if you have a 8GB computer, you may want to use "-M 7".

The -s "--subset" argument can be used to limit dataset size, which can also reduce RAM requirements. This argument can also be used to increase dataset size, which may be useful if you're having trouble in circularizing a mitogenome and got some RAM to spare.

Example of Mitofree's input file:

Basically, this file consists of three tab-separated collumns, each with a specific information:

1-SRA_RUN_NUMBER 2-SPECIES_NAME 3-SEED_GENBANK_ACCESSION

For instance:

ERR1306022	Species1	MK297287
ERR7295165	Species2	MK297241
ERR1306034	Species3	MK291745
#SRR4409513	Species4	MK291678 #This assembly will be skipped

Each line corresponds to a different assembly. This way, you can build a list of as many organisms as you want and assemble their mitogenomes all at once. It is also possible to skip an assembly by adding a hash symbol (#) at the start of its corresponding line.