Skip to content

Commit

Permalink
Merge pull request #74 from Kinggerm/1.7.4.1
Browse files Browse the repository at this point in the history
v1.7.4.1
  • Loading branch information
Kinggerm committed Apr 16, 2021
2 parents 10d7b08 + 50917ba commit 5523e93
Show file tree
Hide file tree
Showing 4 changed files with 106 additions and 89 deletions.
4 changes: 2 additions & 2 deletions GetOrganelleLib/assembly_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -2965,8 +2965,8 @@ def add_gap_nodes_with_spades_res(self, scaffold_fasta, scaffold_paths, min_cov=
length=len(new_seq),
coverage=new_average_cov,
forward_seq=new_seq,
head_connections=OrderedDict([((l_name, l_end), None)]),
tail_connections=OrderedDict([((r_name, r_end), None)]))
head_connections=OrderedDict([((l_name, l_end), ctg_olp)]),
tail_connections=OrderedDict([((r_name, r_end), ctg_olp)]))
self.vertex_info[l_name].connections[l_end][(gap_name, False)] = ctg_olp
self.vertex_info[r_name].connections[r_end][(gap_name, True)] = ctg_olp
gap_added = True
Expand Down
10 changes: 10 additions & 0 deletions GetOrganelleLib/versions.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,15 @@ def get_versions():


versions = [
{
"number": "1.7.4.1",
"features": [
"1. get_organelle_config.py: provide guidance for old code and new database incompatibility (reported by Wenxiang Liu@SWFU)",
"2. assembly_parser.py: fix a bug after scaffolding with SPAdes path (introduced in 1.7.4 feature 5; reported by Robin van Velzen@WUR)",
"3. update README.md with improved instruction",
],
"time": "2021-04-16 14:46 UTC+8"
},
{
"number": "1.7.4",
"features": [
Expand All @@ -18,6 +27,7 @@ def get_versions():
"8. get_organelle_from_reads.py/disentangle_organelle_assembly.py: correct typos",
"9. pipe_control_func.py: map_with_bowtie2: warn reads integrity; build_bowtie2_db: rm small index",
"10. get_organelle_config.py: verbose log for bowtie2 and blast",
"11. update README.md with a reframed instruction",
],
"time": "2021-04-14 17:52 UTC+8"
},
Expand Down
80 changes: 40 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
[![Anaconda-Server Badge](https://anaconda.org/bioconda/getorganelle/badges/downloads.svg)](https://anaconda.org/bioconda/getorganelle)

[![GitHub release](https://img.shields.io/github/release/Kinggerm/GetOrganelle.svg)](https://github.com/Kinggerm/GetOrganelle/releases/)
[![GitHub version](https://img.shields.io/github/commits-since/Kinggerm/GetOrganelle/1.7.4.svg)](https://github.com/Kinggerm/GetOrganelle/commit/master)
[![GitHub version](https://img.shields.io/github/commits-since/Kinggerm/GetOrganelle/1.7.4.1.svg)](https://github.com/Kinggerm/GetOrganelle/commit/master)

This toolkit assemblies organelle genome from genomic skimming data.

Expand Down Expand Up @@ -94,36 +94,36 @@ But you are still highly recommended to read the following minimal introductions
Since v1.6.2, `get_organelle_from_reads.py` will automatically estimate the read data it needs, without user assignment nor data reducing (see flags `--reduce-reads-for-coverage` and `--max-reads`).

* <b>Main Options</b>

Take your input seed (fasta format; if `-s` was not provided,
the default is `GetOrganelleLib/SeedDatabase/*.fasta`) as probe,
the script would recruit target reads in successive rounds (extending process).
The default seed works for most samples, but using a complete organelle genome sequence of a related species as the seed would help the assembly in many cases
(e.g. degraded DNA samples, fastly-evolving in animal/fungal samples).

The value word size (followed with `-w`), like the kmer in assembly, is crucial to the feasibility and efficiency of this process.
The best word size changes upon data and will be affected by read length, read quality, base coverage, organ DNA percent and other factors.
By default, GetOrganelle would automatically estimate a proper word size based on the data characters.
Although the automatically-estimated word size value does not ensure the best performance nor the best result,
you do not need to adjust this value (`-w`) if a complete/circular organelle genome assembly is produced,
because the circular result generated by GetOrganelle is highly consistent under different options and seeds.
The automatically estimated word size may be screwy in some animal mitogenome data due to inaccurate coverage estimation,
for which you fine-tune it instead.

The best kmer(s) depend on a wide variety of factors too.
Although more kmer values add the time consuming, you are recommended to use a wide range of kmers to benefit from the power of SPAdes.
Empirically, you should include at least including one small kmer (e.g. `21`) and one large kmer (`105`) for a successful organelle genome assembly.
* `-w` The value word size, like the kmer in assembly, is crucial to the feasibility and efficiency of this process.
The best word size changes upon data and will be affected by read length, read quality, base coverage, organ DNA percent and other factors.
By default, GetOrganelle would automatically estimate a proper word size based on the data characters.
Although the automatically-estimated word size value does not ensure the best performance nor the best result,
you do not need to adjust this value (`-w`) if a complete/circular organelle genome assembly is produced,
because the circular result generated by GetOrganelle is highly consistent under different options and seeds.
The automatically estimated word size may be screwy in some animal mitogenome data due to inaccurate coverage estimation,
for which you fine-tune it instead.
* `-k` The best kmer(s) depend on a wide variety of factors too.
Although more kmer values add the time consuming, you are recommended to use a wide range of kmers to benefit from the power of SPAdes.
Empirically, you should include at least including one small kmer (e.g. `21`) and one large kmer (`105`) for a successful organelle genome assembly.
* `-s` GetOrganelle takes the seed (fasta format; if this was not provided,
the default is `GetOrganelleLib/SeedDatabase/*.fasta`) as probe,
the script would recruit target reads in successive rounds (extending process).
The default seed works for most samples, but using a complete organelle genome sequence of a related species as the seed would help the assembly in many cases
(e.g. degraded DNA samples, fastly-evolving in animal/fungal samples; see more [here](https://github.com/Kinggerm/GetOrganelle/wiki/FAQ#how-to-assemble-a-target-organelle-genome-using-my-own-reference)).

* <b>Key Results</b>

The key output files include
The key output files include

* `*.path_sequence.fasta`, each fasta file represents one type of genome structure
* `*.selected_graph.gfa`, the [organelle-only assembly graph](https://github.com/Kinggerm/GetOrganelle/wiki/Terminology)
* `get_org.log.txt`, the log file
* `extended_K*.assembly_graph.fastg`, the raw assembly graph
* `extended_K*.assembly_graph.fastg.extend_embplant_pt-embplant_mt.fastg`, a simplified assembly graph
* `extended_K*.assembly_graph.fastg.extend_embplant_pt-embplant_mt.csv`, a tab-format contig label file for bandage visualization
* `*.path_sequence.fasta`, each fasta file represents one type of genome structure
* `*.selected_graph.gfa`, the [organelle-only assembly graph](https://github.com/Kinggerm/GetOrganelle/wiki/Terminology)
* `get_org.log.txt`, the log file
* `extended_K*.assembly_graph.fastg`, the raw assembly graph
* `extended_K*.assembly_graph.fastg.extend_embplant_pt-embplant_mt.fastg`, a simplified assembly graph
* `extended_K*.assembly_graph.fastg.extend_embplant_pt-embplant_mt.csv`, a tab-format contig label file for bandage visualization

You may delete the files other than above if the resulting genome is complete (indicated in the log file and the name of the `*.fasta`).
You are expected to obtain the complete organelle genome assembly for most animal/fungal mitogenomes and plant chloroplast genomes
Expand All @@ -139,22 +139,22 @@ But you are still highly recommended to read the following minimal introductions

* <b>Input data & Main Options</b>

The input must be a FASTG or GFA formatted assembly graph file.

If you input an assembly graph assembled from total DNA sequencing using third-party a de novo assembler (e.g. Velvet),
the assembly graph may includes a great amount of non-target contigs.
You may want to use `--min-depth` and `--max-depth` to greatly reduce the computational burden for target extraction.

If you input an [organelle-equivalent assembly graph](https://github.com/Kinggerm/GetOrganelle/wiki/Terminology)
(e.g. manually curated and exported using Bandage), you may use `--no-slim`.
* `-g` The input must be a FASTG or GFA formatted assembly graph file.
* If you input an assembly graph assembled from total DNA sequencing using third-party a de novo assembler (e.g. Velvet),
the assembly graph may includes a great amount of non-target contigs.
You may want to use `--min-depth` and `--max-depth` to greatly reduce the computational burden for target extraction.
* If you input an [organelle-equivalent assembly graph](https://github.com/Kinggerm/GetOrganelle/wiki/Terminology)
(e.g. manually curated and exported using Bandage), you may use `--no-slim`.

* <b>Key Results</b>

The key output files include

* `*.path_sequence.fasta`, one fasta file represents one type of genome structure
* `*.selected_graph.gfa`, the [organelle-only assembly graph](https://github.com/Kinggerm/GetOrganelle/wiki/Terminology)
* `get_org.log.txt`, the log file
The key output files include
* `*.path_sequence.fasta`, one fasta file represents one type of genome structure
* `*.selected_graph.gfa`, the [organelle-only assembly graph](https://github.com/Kinggerm/GetOrganelle/wiki/Terminology)
* `get_org.log.txt`, the log file


### GetOrganelle flowchart
Expand Down
Loading

0 comments on commit 5523e93

Please sign in to comment.