Usage

Main scripts

get_organelle_from_reads.py

The minimal recipe of assembling an organelle genome (e.g. chloroplast genome) from reads is:

get_organelle_from_reads.py -1 forward.fq -2 reverse.fq -o plastome_output -R 15 -k 21,45,65,85,105 -F embplant_pt

A brief introduction of frequently-used arguments can be found by:

get_organelle_from_reads.py -h

`-1`	str	Input file with forward paired-end reads (*.fq/.gz/.tar.gz).
`-2`	str	Input file with reverse paired-end reads (*.fq/.gz/.tar.gz).
`-u`	str	Input file(s) with unpaired (single-end) reads.
`-o`	str	Output directory.
`-s`	str	Input fasta format file as initial seed. SEQ_DB_PATH/*.fasta
`-w`	int/float	Word size (W) for extension. Default: auto-estimated
`-R`	int	Maximum extension rounds (suggested: >=2). Default: 15 (embplant_pt)
`-F`	str	Target organelle genome type(s): embplant_pt other_pt embplant_mt embplant_nr animal_mt fungus_mt fungus_nr anonym embplant_pt,embplant_mt other_pt,embplant_mt,fngus_mt
`--max-reads`	int	Maximum number of reads to be used per file. Default: 1.5E7 (-F embplant_pt/embplant_nr/fungus_mt/fungus_nr); 7.5E7 (-F embplant_mt/other_pt/anonym); 3E8 (-F animal_mt)
`--fast`		="-R 10 -t 4 -J 5 -M 7 --max-n-words 3E7 --larger-auto-ws --disentangle-time-limit 360"
`-k`	int[,int]	SPAdes kmer settings. Default: 21,55,85,115
`-t`	int	Maximum threads to use. Default: 1
`-P`	int	Pre-grouping value. Default: int(2E5)
`-v`		print the current version of GetOrganelle
`-h`		print brief introduction for frequently-used options.
`--help`		print verbose introduction for all options.

A detailed introduction of all arguments can be found by:

get_organelle_from_reads.py --help

`-1`	str	Input file with forward paired-end reads (format: fastq/fastq.gz/fastq.tar.gz).
`-2`	str	Input file with reverse paired-end reads (format: fastq/fastq.gz/fastq.tar.gz).
`-u`	str	Input file(s) with unpaired (single-end) reads (format: fastq/fastq.gz/fastq.tar.gz). files could be comma-separated lists such as 'seq1.fq,seq2.fq'.
`-o`	str	Output directory. Overwriting files if directory exists.
`-s`	str	Seed sequence(s). Input fasta format file as initial seed. A seed sequence in GetOrganelle is only used for identifying initial organelle reads. The assembly process is purely de novo. Should be a list of files split by comma(s) on a multi-organelle mode, with the same list length to organelle_type (followed by '-F'). Default: SEQ_DB_PATH/*.fasta
`-a`	str	Anti-seed(s). Not suggested unless what you really know what you are doing. Input fasta format file as anti-seed, where the extension process stop. Typically serves as excluding plastid reads when extending mitochondrial reads, or the other way around. You should be cautious about using this option, because if the anti-seed includes some word in the target but not in the seed, the result would have gaps. For example, use the embplant_mt and embplant_pt from the same plant-species as seed and anti-seed.
`--max-reads`	int	Hard bound for maximum number of reads to be used per file. A input larger than 536870911 will be treated as infinity (INF). Default: 1.5E7 (-F embplant_pt/embplant_nr/fungus_mt/fungus_nr); 7.5E7 (-F embplant_mt/other_pt/anonym); 3E8 (-F animal_mt)
`--reduce-reads-for-coverage`	float	Soft bound for maximum number of reads to be used according to target-hitting base coverage. If the estimated target-hitting base coverage is too high and over this VALUE, GetOrganelle automatically reduce the number of reads to generate a final assembly with base coverage close to this VALUE. This design could greatly save computational resources in many situations. A mean base coverage over 500 is extremely sufficient for most cases. This VALUE must be larger than 10. Set this VALUE to inf to disable reducing. Default: 500.
`--max-ignore-percent`	float	The maximum percent of bases to be ignore in extension, due to low quality. Default: 0.01
`--phred-offset`	int	Phred offset for spades-hammer. Default: GetOrganelle-autodetect
`--min-quality-score`	int	Minimum quality score in extension. This value would be automatically decreased to prevent ignoring too much raw data (see --max-ignore-percent).Default: 1 ('"' in Phred+33; 'A' in Phred+64/Solexa+64)
`--prefix`	str	Add extra prefix to resulting files under the output directory.
`--out-per-round`		Enable output per round. Choose to save memory but cost more time per round.
`--zip-files`		Choose to compress fq/sam files using gzip.
`--keep-temp`		Choose to keep the running temp/index files.
`--config-dir`	str	The directory where the configuration file and default databases were placed. The default value also can be changed by adding 'export GETORG_PATH=your_favor' to the shell script (e.g. ~/.bash_profile or ~/.bashrc) Default: ~/.GetOrganelle
`-F`	str	This flag should be followed with embplant_pt (embryophyta plant plastome), other_pt (non-embryophyta plant plastome), embplant_mt (plant mitogenome), embplant_nr (plant nuclear ribosomal RNA), animal_mt (animal mitogenome), fungus_mt (fungus mitogenome), fungus_nr (fungus nuclear ribosomal RNA)or embplant_mt,other_pt,fungus_mt (the combination of any of above organelle genomes split by comma(s), which might be computationally more intensive than separate runs), or anonym (uncertain organelle genome type). The anonym should be used with customized seed and label databases ('-s' and '--genes'). For easy usage and compatibility of old versions, following redirection would be automatically fulfilled without warning: plant_cp->embplant_pt; plant_pt->embplant_pt; plant_mt->embplant_mt; plant_nr->embplant_nr
`--fast`		="-R 10 -t 4 -J 5 -M 7 --max-n-words 3E7 --larger-auto-ws --disentangle-time-limit 360" This option is suggested for homogeneously and highly covered data (very fine data). You can overwrite the value of a specific option listed above by adding that option along with the "--fast" flag. You could try GetOrganelle with this option for a list of samples and run a second time without this option for the rest with incomplete results.
`--memory-save`		="--out-per-round -P 0 --remove-duplicates 0" You can overwrite the value of a specific option listed above by adding that option along with the "--memory-save" flag. A larger '-R' value is suggested when "--memory-save" is chosen.
`--memory-unlimited`		="-P 1E7 --index-in-memory --remove-duplicates 2E8 --min-quality-score -5 --max-ignore-percent 0" You can overwrite the value of a specific option listed above by adding that option along with the "--memory-unlimited" flag.
`-w`	int/float	Word size (W) for pre-grouping (if not assigned by '--pre-w'
`--pre-w`	int/float	Word size (W) for pre-grouping. Used to reproduce result when word size is a certain value during pregrouping process and later changed during reads extending process. Similar to word size. Default: the same to word size.
`-R`	int	Maximum number of extending rounds (suggested: >=2). Default: 15 (-F embplant_pt), 30 (-F embplant_mt/other_pt), 10 (-F embplant_nr/animal_mt/fungus_mt/fungus_nr), inf (-P 0).
`--max-n-words`	int	Maximum number of words to be used in total.Default: 4E8 (-F embplant_pt), 2E8 (-F embplant_nr/fungus_mt/fungus_nr/animal_mt), 2E9 (-F embplant_mt/other_pt)
`-J`	int	The length of step for checking words in reads during extending process (integer >= 1). When you have reads of high quality, the larger the number is, the faster the extension will be, the more risk of missing reads in low coverage area. Choose 1 to choose the slowest but safest extension strategy. Default: 3
`-M`	int	(Beta parameter) The length of step for building words from seeds during extending process (integer >= 1). When you have reads of high quality, the larger the number is, the faster the extension will be, the more risk of missing reads in low coverage area. Another usage of this mesh size is to choose a larger mesh size coupled with a smaller word size, which makes smaller word size feasible when memory is limited.Choose 1 to choose the slowest but safest extension strategy. Default: 2
`--bowtie2-options`	"str"	Bowtie2 options, such as '--ma 3 --mp 5,2 --very-fast -t'. Default: --very-fast -t.
`--larger-auto-ws`		By using this flag, the empirical function for estimating W would tend to produce a relative larger W, which would speed up the matching in extending, reduce the memory cost in extending, but increase the risk of broken final graph. Suggested when the data is good with high and homogenous coverage.
`--target-genome-size`		Hypothetical value(s) of target genome size. This is only used for estimating word size when no '-w word_size' is given. Should be a list of INTEGER numbers split by comma(s) on a multi-organelle mode, with the same list length to organelle_type (followed by '-F'). Default: 130000 (-F embplant_pt) 390000 (-F embplant_mt) 13000 (-F embplant_nr) 39000 (-F other_pt) 13000 (-F animal_mt) 65000 (-F fungus_mt) 13000 (-F fungus_nr) 39000,390000,65000 (-F other_pt,embplant_mt,fungus_mt)
`--max-extending-len`	int	Maximum extending length(s) derived from the seed(s). A single value could be a non-negative number, or inf (infinite) or auto (automatic estimation). This is designed for properly stopping the extending from getting too long and saving computational resources. However, empirically, a maximum extending length value larger than 6000 would not be helpful for saving computational resources. This value would not be precise in controlling output size, especially when pre-group (followed by '-P') is turn on. In the auto mode, the maximum extending length is estimated based on the sizes of the gap regions that not covered in the seed sequences. A sequence of a closely related species would be preferred for estimating a better maximum extending length value. If you are using limited loci, e.g. rbcL gene as the seed for assembling the whole plastome (with extending length ca. 75000 >> 6000), you should set maximum extending length to inf. Should be a list of numbers/auto/no split by comma(s) on a multi-organelle mode, with the same list length to organelle_type (followed by '-F'). Default: inf.
`-k`	int[,int]	SPAdes kmer settings. Use the same format as in SPAdes. illegal kmer values would be automatically discarded by GetOrganelle. Default: 21,55,85,115
`--spades-options`	"str"	Other SPAdes options. Use double quotation marks to include all the arguments and parameters.
`--no-spades`		Disable SPAdes.
`--ignore-k`	int	A kmer threshold below which, no slimming/disentangling would be executed on the result. Default: 40
`--genes`	str	Followed with a customized database (a fasta file or the base name of a blast database) containing or made of ONE set of protein coding genes and ribosomal RNAs extracted from ONE reference genome that you want to assemble. Should be a list of databases split by comma(s) on a multi-organelle mode, with the same list length to organelle_type (followed by '-F'). This is optional for any organelle mentioned in '-F' but required for 'anonym'. By default, default database(s) in LBL_DB_PATH would be used contingent on the organelle types chosen (-F). The default value will become invalid when '--genes' or '--ex-genes' is used.
`--ex-genes`	str	This is optional and Not suggested, since non-target contigs could contribute information for better downstream coverage-based clustering. Followed with a customized database (a fasta file or the base name of a blast database) containing or made of protein coding genes and ribosomal RNAs extracted from reference genome(s) that you want to exclude. Could be a list of databases split by comma(s) but NOT required to have the same list length to organelle_type (followed by '-F'). The default value will become invalid when '--genes' or '--ex-genes' is used.
`--disentangle-df`	float	Depth factor for differentiate genome type of contigs. The genome type of contigs are determined by blast. Default: 10.0
`--contamination-depth`	float	Depth factor for confirming contamination in parallel contigs. Default: 3.
`--contamination-similarity`	float	Similarity threshold for confirming contaminating contigs. Default: 0.9
`--no-degenerate`		Disable making consensus from parallel contig based on nucleotide degenerate table.
`--degenerate-depth`	float	Depth factor for confirming parallel contigs. Default: 1.5
`--degenerate-similarity`	float	Similarity threshold for confirming parallel contigs. Default: 0.98
`--disentangle-time-limit`	int	Time limit (second) for each try of disentangling a graph file as a circular genome. Disentangling a graph as contigs is not limited. Default: 1800
`--expected-max-size`	str	Expected maximum target genome size(s) for disentangling. Should be a list of INTEGER numbers split by comma(s) on a multi-organelle mode, with the same list length to organelle_type (followed by '-F'). Default: 250000 (-F embplant_pt/fungus_mt), 25000 (-F embplant_nr/animal_mt/fungus_nr), 1000000 (-F embplant_mt/other_pt), 1000000,1000000,250000 (-F other_pt,embplant_mt,fungus_mt)
`--expected-min-size`	str	Expected minimum target genome size(s) for disentangling. Should be a list of INTEGER numbers split by comma(s) on a multi-organelle mode, with the same list length to organelle_type (followed by '-F'). Default: 10000 for all
`--reverse-lsc`		For '-F embplant_pt' with complete circular result, by default, the direction of the starting contig (usually the LSC region) is determined as the direction with less ORFs. Choose this option to reverse the direction of the starting contig when result is circular. Actually, both directions are biologically equivalent to each other. The reordering of the direction is only for easier downstream analysis.
`--max-paths-num`	int	Repeats would dramatically increase the number of potential isomers (paths). This option was used to export a certain amount of paths out of all possible paths per assembly graph. Default: 1000
`-t`	int	Maximum threads to use.
`-P`	int	The maximum number (integer) of high-covered reads to be pre-grouped before extending process. pre_grouping is suggested when the whole genome coverage is shallow but the organ genome coverage is deep. The default value is 2E5. For personal computer with 8G memory, we suggest no more than 3E5. A larger number (ex. 6E5) would run faster but exhaust memory in the first few minutes. Choose 0 to disable this process.
`--which-blast`	str	Assign the path to BLAST binary files if not added to the path. Default: try "" + os.path.realpath(GO_DEP_PATH) +/ncbi-blast" first, then $PATH
`--which-bowtie2`	str	Assign the path to Bowtie2 binary files if not added to the path. Default: try "" + os.path.realpath(GO_DEP_PATH) +/bowtie2" first, then $PATH
`--which-spades`		Assign the path to SPAdes binary files if not added to the path. Default: try "" + os.path.realpath(GO_DEP_PATH) +/SPAdes" first, then $PATH
`--which-bandage`	str	Assign the path to bandage binary file if not added to the path. Default: try $PATH
`--continue`		Several check points based on files produced, rather than on the log file, so keep in mind that this script will NOT detect the difference between this input parameters and the previous ones.
`--overwrite`		Overwrite previous file if existed.
`--index-in-memory`		Keep index in memory. Choose save index in memory than in disk.
`--remove-duplicates`	int	By default this script use unique reads to extend. Choose the number of duplicates (integer) to be saved in memory. A larger number (ex. 2E7) would run faster but exhaust memory in the first few minutes. Choose 0 to disable this process. Note that whether choose or not will not disable the calling of replicate reads. Default: 1E7.
`--flush-step`	int	Flush step (INTEGER OR INF) for presenting progress. For running in the background, you could set this to inf, which would disable this. Default: 54321
`--random-seed`	int	Default: 12345
`--verbose`		Verbose output. Choose to enable verbose running log_handler.
`-v`		print the current version of GetOrganelle
`-h`		print brief introduction for frequently-used options.
`--help`		print verbose introduction for all options.

get_organelle_from_assembly.py

The basic recipe of assembling an organelle genome (e.g. chloroplast genome) from assembly graph is:

get_organelle_from_assembly.py -g assembly_graph.gfa -o plastome_output -F embplant_pt

Tips: You may add --min-depth KMER_DPETH_THRESHOLD to throw away most shallow coverage contigs for a quick run. If it is still taking long time, kill it and restart it with a larger KMER_DPETH_THRESHOLD. When the quick run fiinshed, check the coverage of the target genome (the log file will tell you) if it is significantly deviated from the KMER_DPETH_THRESHOLD: if the output target genome has much larger, say 5 times larger, than the KMER_DPETH_THRESHOLD, the output should be generally free from the influence of the KMER_DPETH_THRESHOLD; if the output target genome has similar depth to the KMER_DPETH_THRESHOLD or there was no target genome found, try to reduce the KMER_DPETH_THRESHOLD.

A brief introduction of frequently-used arguments can be found by:

get_organelle_from_assembly.py -h

`-F`	str	Target organelle genome type(s): embplant_pt/other_pt/embplant_mt/embplant_nr/animal_mt/fungus_mt/fungus_nr/anonym/embplant_pt,embplant_mt/other_pt,embplant_mt,fungus_mt
`-g`		Input assembly graph (fastg/gfa) file.
`-o`	str	Output directory.
`--min-depth`	float	Minimum depth threshold of contigs. Default: 0..
`--max-depth`	float	Maximum depth threshold of contigs. Default: inf.
`--no-slim`		Disable the slimming process and directly disentangle the assembly graph.
`-t`	int	Maximum threads to use. Default: 1.
`--continue`		Resume a previous run. Default: False.

A detailed introduction of all arguments can be found by:

get_organelle_from_assembly.py --help

`-F`	str	This flag should be followed with embplant_pt (embryophyta plant plastome), other_pt (non-embryophyta plant plastome), embplant_mt (plant mitochondrion), embplant_nr (plant nuclear ribosomal RNA), animal_mt (animal mitochondrion), fungus_mt (fungus mitochondrion), fungus_nr (fungus nuclear ribosomal RNA), or embplant_mt,other_pt,fungus_mt (the combination of any of above organelle genomes split by comma(s), which might be computationally more intensive than separate runs), or anonym (uncertain organelle genome type). The anonym should be used with customized seed and label databases ('-s' and '--genes').
`-g`		Input assembly graph (fastg/gfa) file. The format will be recognized by the file name suffix.
`-o`	str	Output directory. Overwriting files if directory exists.
`--min-depth`	float	Input a float or integer number. Filter graph file by a minimum depth. Default: 0..'
`--max-depth`	float	Input a float or integer number. filter graph file by a maximum depth. Default: inf.'
`--config-dir`	str	The directory where the configuration file and default databases were placed. The default value also can be changed by adding 'export GETORG_PATH=your_favor' to the shell script (e.g. ~/.bash_profile or ~/.bashrc) Default: ~/.GetOrganelle
`--genes`	str	Followed with a customized database (a fasta file or the base name of a blast database) containing or made of ONE set of protein coding genes and ribosomal RNAs extracted from ONE reference genome that you want to assemble. Should be a list of databases split by comma(s) on a multi-organelle mode, with the same list length to organelle_type (followed by '-F'). This is optional for any organelle mentioned in '-F' but required for 'anonym'. By default, default database(s) in LBL_DB_PATH would be used contingent on the organelle types chosen (-F). The default value will become invalid when '--genes' or '--ex-genes' is used.
`--ex-genes`	str	This is optional and Not suggested, since non-target contigs could contribute information for better downstream coverage-based clustering. Followed with a customized database (a fasta file or the base name of a blast database) containing or made of protein coding genes and ribosomal RNAs extracted from reference genome(s) that you want to exclude. Could be a list of databases split by comma(s) but NOT required to have the same list length to organelle_type (followed by '-F'). The default value will become invalid when '--genes' or '--ex-genes' is used.
`--no-slim`		Disable slimming process and directly disentangle the original assembly graph. Default: False
`--slim-options`	str	Other options for calling slim_graph.py
`--max-slim-extending-len`	float	This is used to limit the extending length, below which a "non-hit contig" is allowed to be distant from a "hit contig" to be kept. See more under slim_graph.py:--max-slim-extending-len. Default: 15000 (-F embplant_pt), 50000 (-F embplant_mt/fungus_mt/other_pt), 12500 (-F embplant_nr/fungus_nr/animal_mt), maximum_of_type1_type2 (-F type1,type2), inf (-F anonym)
`--spades-out-dir`	str	Input spades output directory with 'scaffolds.fasta' and 'scaffolds.paths',
`--depth-factor`	float	Depth factor for differentiate genome type of contigs. The genome type of contigs are determined by blast. Default: 10.0
`--type-f`	float	Type factor for identifying contig type tag when multiple tags exist in one contig. Default:3.
`--contamination-depth`	float	Depth factor for confirming contamination in parallel contigs. Default: 3.
`--contamination-similarity`	float	Similarity threshold for confirming contaminating contigs. Default: 0.9
`--no-degenerate`		Disable making consensus from parallel contig based on nucleotide degenerate table.
`--degenerate-depth`	float	Depth factor for confirming parallel contigs. Default: 1.5
`--degenerate-similarity`	float	Similarity threshold for confirming parallel contigs. Default: 0.98
`--disentangle-time-limit`	int	Time limit (second) for each try of disentangling a graph file as a circular genome. Disentangling a graph as contigs is not limited. Default: 3600
`--expected-max-size`	str	Expected maximum target genome size(s) for disentangling. Should be a list of INTEGER numbers split by comma(s) on a multi-organelle mode, with the same list length to organelle_type (followed by '-F'). Default: 250000 (-F embplant_pt/fungus_mt), 25000 (-F embplant_nr/animal_mt/fungus_nr), 1000000 (-F embplant_mt/other_pt), 1000000,1000000,250000 (-F other_pt,embplant_mt,fungus_mt)
`--expected-min-size`	str	Expected minimum target genome size(s) for disentangling. Should be a list of INTEGER numbers split by comma(s) on a multi-organelle mode, with the same list length to organelle_type (followed by '-F'). Default: 10000 for all
`--reverse-lsc`		For '-F embplant_pt' with complete circular result, by default, the direction of the starting contig (usually the LSC contig) is determined as the direction with less ORFs. Choose this option to reverse the direction of the starting contig when result is circular. Actually, both directions are biologically equivalent to each other. The reordering of the direction is only for easier downstream analysis.
`--max-paths-num`	int	Repeats would dramatically increase the number of potential isomers (paths). This option was used to export a certain amount of paths out of all possible paths per assembly graph. Default: 1000
`--keep-all-polymorphic`		By default, this script would pick the contig with highest coverage among all parallel (polymorphic) contigs when degenerating was not applicable. Choose this flag to export all combinations.
`--min-sigma`	float	Minimum deviation factor for excluding non-target contigs. Default:0.1
`--max-multiplicity`	int	Maximum multiplicity of contigs for disentangling genome paths. Should be 1~12. Default:8
`-t`	int	Maximum threads to use.
`--prefix`	str	Add extra prefix to resulting files under the output directory.
`--which-blast`	str	Assign the path to BLAST binary files if not added to the path. Default: try "" + os.path.realpath(GO_DEP_PATH) +/ncbi-blast" first, then $PATH
`--which-bandage`	str	Assign the path to bandage binary file if not added to the path. Default: try $PATH
`--keep-temp`		Choose to keep the running temp/index files.
`--continue`		Several check points based on files produced, rather than on the log file, so keep in mind that this script will not detect the difference between this input parameters and the previous ones.
`--overwrite`		Overwrite previous file if existed.
`--random-seed`	int	Default: 12345
`-v`		print the current version of GetOrganelle
`--verbose`		Verbose output. Choose to enable verbose running log_handler.
`-h`		print brief introduction for frequently-used options.
`--help`		print verbose introduction for all options.

Utilities (Typing unfinished)

slim_graph.py

slim_graph.py can be super useful for choosing target graph components from assembly graph file (fastg/gfa), and provide useful information for downstream analysis.

It conducts the blast and use the hit table, contig connection information and other requirements (e.g. contig depth, extending length starting from the hit region) to keep target graph components, and store the hit table as Bandage loadable tab file (ends with csv though). It can also deal with plain fasta format assemblies without contig connection information, serving like a wrapper for filtering in/out blast hit contigs.

Like all scripts in GetOrganelle, the introduction of slim_graph.py arguments can be found by:

slim_graph.py -h

`-o`	str	Output file
`assemblies`	str	Input assembly (graph) files (.fasta or .gfa or .fastg). Please split the files by spaces.
`-F`	str	followed with mode embplant_pt, other_pt, embplant_mt, embplant_nr, animal_mt, fungus_mt, fungus_nr (which means embryophyta plastome, non-embryophyta plastome, plant mitochondrion, plant nuclear ribosomal RNA, animal mitochondrion, fungus mitochondrion, fungus nuclear ribosomal RNA separately), or a combination of above split by comma(s) (corresponds to certain arguments as following listed). ------------------------------------------------------ embplant_pt " --include-priority " + os.path.join(LBL_DB_PATH, "embplant_pt.fasta
`-E`		followed with mode embplant_pt, other_pt, embplant_mt, embplant_nr, animal_mt, fungus_mt,fungus_nr (which means embryophyta plastome, non-embryophyta plastome, plant mitochondrion, plant nuclear ribosomal RNA, animal mitochondrion, fungus mitochondrion, fungus nuclear ribosomal RNA separately), or a combination of above split by comma(s) (be similar to -F and corresponds to certain arguments as following listed). ------------------------------------------------------ embplant_pt " --exclude " + os.path.join(LBL_DB_PATH, "embplant_pt.fasta
`--no-hits`		Provide treatment for non-hitting contigs. ------------------------------------------------------ ex_no_con keep those connect with hitting-include contigs. (Default) ------------------------------------------------------ ex_no_hit exclude all. ------------------------------------------------------ keep_all keep all ------------------------------------------------------
`--max-slim-extending-len`	float	This is used to limit the extending length, below which a "non-hit contig" is allowed to be distant from a "hit contig" to be kept. This distance is measured by the shortest distance connecting those two contigs, weighted by the depth of the "hit contig". This is used only when "--no-hits ex_no_con" was chosen. Should be a single INTEGER number or inf (meaning infinite). It is supposed to be half of the maximum expected genome size to be safe, but could be much smaller if the LabelDatabse is closely related. Default: " +str(MAX_SLIM_EXTENDING_LENS["embplant_pt"]) + " (-F embplant_pt), " +str(MAX_SLIM_EXTENDING_LENS["embplant_mt"]) + " (-F embplant_mt/fungus_mt/other_pt), " +str(MAX_SLIM_EXTENDING_LENS["embplant_nr"]) + " (-F embplant_nr/fungus_nr/animal_mt), maximum_of_type1_type2 (-F type1,type2), MAX_SLIM_EXTENDING_LENS["anonym"] (cases without using -F)
`--significant`	float	Within a contig, if the query-score of hitting A is more than given times (Default: 3.0) of the query-score of hitting B, this contig would be treated as only A related, rather than both.
`--depth-cutoff`	float	After detection for target coverage, those beyond certain times (depth cutoff) of the detected coverage would be excluded. Default: 10000.0
`--min-depth`	float	Input a float or integer number. Filter fastg file by a minimum depth. Default: 0..
`--max-depth`	float	Input a float or integer number. filter fastg file by a maximum depth. Default: inf.
`--merge`		Merge all possible contigs.
`--include`		followed by Blastn database(s)
`--include-priority`		followed by Blastn database(s).
`--exclude`		followed by Blastn database(s).
`--exclude-priority`		followed by Blastn database(s)
`--no-hits-labeled-tab`		Choose to disable producing tab file
`--keep-temp`		Choose to disable deleting temp files produced by blast and this script
`-o`	str	By default the output would be along with the input fastg file. But you could assign a new directory with this option.
`-e`	float	blastn evalue threshold. Default: 1e-25
`--prefix`	str	Add prefix to the output basename. Conflict with "--out-base".
`--out-base`		By default the output basename would be modified based on the input fastg file. But you could assign a new basename with this option. Conflict with "--prefix". Conflict with multiple input files!
`--log`		Generate log file.
`--wrapper`		Wrapper mode logging when called by get_organelle*.py. Default: False
`--verbose`		For debug usage.
`--continue`		Specified for calling from get_organelle_from_reads.py
`--no-overwrite`		Overwrite existing output result.
`--which-blast`	str	Assign the path to BLAST binary files if not added to the path. Default: try "" + os.path.realpath("GetOrganelleDep
`--config-dir`	str	The directory where the default databases were placed. The default value also can be changed by adding 'export GETORG_PATH=your_favor' to the shell script (e.g. ~/.bash_profile or ~/.bashrc) Default: " + GO_PATH
`-t`	int	Threads for blastn.
`-v`		print the current version of GetOrganelle

summary_get_organelle_output.py

summary_get_organelle_output.py can be used to summarize get_organelle_from_reads.py output results into a CSV-formatted table.

`sample_folders`	str	Input a list of folders generated by get_organelle_from_reads.py.Please split the files by spaces.
`-o`	str	Output csv file.
`--verbose`		Verbose style.
`-v`		print the current version of GetOrganelle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage

Main scripts

get_organelle_from_reads.py

get_organelle_from_assembly.py

Utilities (Typing unfinished)

slim_graph.py

summary_get_organelle_output.py

Home

Installation

Initialization

Examples

Usage

FAQ

Clone this wiki locally