Merge pull request #74 from Kinggerm/1.7.4.1

v1.7.4.1
Kinggerm · Apr 16, 2021 · 5523e93 · 5523e93
2 parents 10d7b08 + 50917ba
commit 5523e93
Show file tree

Hide file tree

Showing 4 changed files with 106 additions and 89 deletions.
diff --git a/GetOrganelleLib/assembly_parser.py b/GetOrganelleLib/assembly_parser.py
@@ -2965,8 +2965,8 @@ def add_gap_nodes_with_spades_res(self, scaffold_fasta, scaffold_paths, min_cov=
                                                         length=len(new_seq),
                                                         coverage=new_average_cov,
                                                         forward_seq=new_seq,
-                                                        head_connections=OrderedDict([((l_name, l_end), None)]),
-                                                        tail_connections=OrderedDict([((r_name, r_end), None)]))
+                                                        head_connections=OrderedDict([((l_name, l_end), ctg_olp)]),
+                                                        tail_connections=OrderedDict([((r_name, r_end), ctg_olp)]))
                     self.vertex_info[l_name].connections[l_end][(gap_name, False)] = ctg_olp
                     self.vertex_info[r_name].connections[r_end][(gap_name, True)] = ctg_olp
                     gap_added = True

diff --git a/GetOrganelleLib/versions.py b/GetOrganelleLib/versions.py
@@ -5,6 +5,15 @@ def get_versions():
 
 
 versions = [
+    {
+        "number": "1.7.4.1",
+        "features": [
+            "1. get_organelle_config.py: provide guidance for old code and new database incompatibility (reported by Wenxiang Liu@SWFU)",
+            "2. assembly_parser.py: fix a bug after scaffolding with SPAdes path (introduced in 1.7.4 feature 5; reported by Robin van Velzen@WUR)",
+            "3. update README.md with improved instruction",
+        ],
+        "time": "2021-04-16 14:46 UTC+8"
+    },
     {
         "number": "1.7.4",
         "features": [
@@ -18,6 +27,7 @@ def get_versions():
             "8. get_organelle_from_reads.py/disentangle_organelle_assembly.py: correct typos",
             "9. pipe_control_func.py: map_with_bowtie2: warn reads integrity; build_bowtie2_db: rm small index",
             "10. get_organelle_config.py: verbose log for bowtie2 and blast",
+            "11. update README.md with a reframed instruction",
         ],
         "time": "2021-04-14 17:52 UTC+8"
     },

diff --git a/README.md b/README.md
@@ -6,7 +6,7 @@
 [![Anaconda-Server Badge](https://anaconda.org/bioconda/getorganelle/badges/downloads.svg)](https://anaconda.org/bioconda/getorganelle)
 
 [![GitHub release](https://img.shields.io/github/release/Kinggerm/GetOrganelle.svg)](https://github.com/Kinggerm/GetOrganelle/releases/)
-[![GitHub version](https://img.shields.io/github/commits-since/Kinggerm/GetOrganelle/1.7.4.svg)](https://github.com/Kinggerm/GetOrganelle/commit/master)
+[![GitHub version](https://img.shields.io/github/commits-since/Kinggerm/GetOrganelle/1.7.4.1.svg)](https://github.com/Kinggerm/GetOrganelle/commit/master)
 
 This toolkit assemblies organelle genome from genomic skimming data. 
 
@@ -94,36 +94,36 @@ But you are still highly recommended to read the following minimal introductions
   Since v1.6.2, `get_organelle_from_reads.py` will automatically estimate the read data it needs, without user assignment nor data reducing (see flags `--reduce-reads-for-coverage` and `--max-reads`). 
 
   * <b>Main Options</b>
-
-  Take your input seed (fasta format; if `-s` was not provided, 
-  the default is `GetOrganelleLib/SeedDatabase/*.fasta`) as probe, 
-  the script would recruit target reads in successive rounds (extending process). 
-  The default seed works for most samples, but using a complete organelle genome sequence of a related species as the seed would help the assembly in many cases 
-  (e.g. degraded DNA samples, fastly-evolving in animal/fungal samples). 
-
-  The value word size (followed with `-w`), like the kmer in assembly, is crucial to the feasibility and efficiency of this process. 
-  The best word size changes upon data and will be affected by read length, read quality, base coverage, organ DNA percent and other factors. 
-  By default, GetOrganelle would automatically estimate a proper word size based on the data characters. 
-  Although the automatically-estimated word size value does not ensure the best performance nor the best result, 
-  you do not need to adjust this value (`-w`) if a complete/circular organelle genome assembly is produced, 
-  because the circular result generated by GetOrganelle is highly consistent under different options and seeds. 
-  The automatically estimated word size may be screwy in some animal mitogenome data due to inaccurate coverage estimation, 
-  for which you fine-tune it instead. 
-
-  The best kmer(s) depend on a wide variety of factors too. 
-  Although more kmer values add the time consuming, you are recommended to use a wide range of kmers to benefit from the power of SPAdes. 
-  Empirically, you should include at least including one small kmer (e.g. `21`) and one large kmer (`105`) for a successful organelle genome assembly.
+    
+    * `-w` The value word size, like the kmer in assembly, is crucial to the feasibility and efficiency of this process. 
+    The best word size changes upon data and will be affected by read length, read quality, base coverage, organ DNA percent and other factors. 
+    By default, GetOrganelle would automatically estimate a proper word size based on the data characters. 
+    Although the automatically-estimated word size value does not ensure the best performance nor the best result, 
+    you do not need to adjust this value (`-w`) if a complete/circular organelle genome assembly is produced, 
+    because the circular result generated by GetOrganelle is highly consistent under different options and seeds. 
+    The automatically estimated word size may be screwy in some animal mitogenome data due to inaccurate coverage estimation, 
+    for which you fine-tune it instead. 
+    
+    * `-k` The best kmer(s) depend on a wide variety of factors too. 
+    Although more kmer values add the time consuming, you are recommended to use a wide range of kmers to benefit from the power of SPAdes. 
+    Empirically, you should include at least including one small kmer (e.g. `21`) and one large kmer (`105`) for a successful organelle genome assembly.
+    
+    * `-s` GetOrganelle takes the seed (fasta format; if this was not provided, 
+    the default is `GetOrganelleLib/SeedDatabase/*.fasta`) as probe, 
+    the script would recruit target reads in successive rounds (extending process). 
+    The default seed works for most samples, but using a complete organelle genome sequence of a related species as the seed would help the assembly in many cases 
+    (e.g. degraded DNA samples, fastly-evolving in animal/fungal samples; see more [here](https://github.com/Kinggerm/GetOrganelle/wiki/FAQ#how-to-assemble-a-target-organelle-genome-using-my-own-reference)). 
 
   * <b>Key Results</b>
 
-  The key output files include
+    The key output files include
 
-   * `*.path_sequence.fasta`, each fasta file represents one type of genome structure
-   * `*.selected_graph.gfa`, the [organelle-only assembly graph](https://github.com/Kinggerm/GetOrganelle/wiki/Terminology)
-   * `get_org.log.txt`, the log file
-   * `extended_K*.assembly_graph.fastg`, the raw assembly graph
-   * `extended_K*.assembly_graph.fastg.extend_embplant_pt-embplant_mt.fastg`, a simplified assembly graph 
-   * `extended_K*.assembly_graph.fastg.extend_embplant_pt-embplant_mt.csv`, a tab-format contig label file for bandage visualization
+    * `*.path_sequence.fasta`, each fasta file represents one type of genome structure
+    * `*.selected_graph.gfa`, the [organelle-only assembly graph](https://github.com/Kinggerm/GetOrganelle/wiki/Terminology)
+    * `get_org.log.txt`, the log file
+    * `extended_K*.assembly_graph.fastg`, the raw assembly graph
+    * `extended_K*.assembly_graph.fastg.extend_embplant_pt-embplant_mt.fastg`, a simplified assembly graph 
+    * `extended_K*.assembly_graph.fastg.extend_embplant_pt-embplant_mt.csv`, a tab-format contig label file for bandage visualization
 
   You may delete the files other than above if the resulting genome is complete (indicated in the log file and the name of the `*.fasta`). 
   You are expected to obtain the complete organelle genome assembly for most animal/fungal mitogenomes and plant chloroplast genomes 
@@ -139,22 +139,22 @@ But you are still highly recommended to read the following minimal introductions
 
   * <b>Input data & Main Options</b>
 
-  The input must be a FASTG or GFA formatted assembly graph file. 
-
-  If you input an assembly graph assembled from total DNA sequencing using third-party a de novo assembler (e.g. Velvet), 
-  the assembly graph may includes a great amount of non-target contigs. 
-  You may want to use `--min-depth` and `--max-depth` to greatly reduce the computational burden for target extraction.
-
-  If you input an [organelle-equivalent assembly graph](https://github.com/Kinggerm/GetOrganelle/wiki/Terminology) 
-  (e.g. manually curated and exported using Bandage), you may use `--no-slim`.
+    * `-g` The input must be a FASTG or GFA formatted assembly graph file. 
+    
+    * If you input an assembly graph assembled from total DNA sequencing using third-party a de novo assembler (e.g. Velvet), 
+    the assembly graph may includes a great amount of non-target contigs. 
+    You may want to use `--min-depth` and `--max-depth` to greatly reduce the computational burden for target extraction.
+    
+    * If you input an [organelle-equivalent assembly graph](https://github.com/Kinggerm/GetOrganelle/wiki/Terminology) 
+    (e.g. manually curated and exported using Bandage), you may use `--no-slim`.
 
   * <b>Key Results</b>
 
-  The key output files include
-
-   * `*.path_sequence.fasta`, one fasta file represents one type of genome structure
-   * `*.selected_graph.gfa`, the [organelle-only assembly graph](https://github.com/Kinggerm/GetOrganelle/wiki/Terminology)
-   * `get_org.log.txt`, the log file
+    The key output files include
+    
+    * `*.path_sequence.fasta`, one fasta file represents one type of genome structure
+    * `*.selected_graph.gfa`, the [organelle-only assembly graph](https://github.com/Kinggerm/GetOrganelle/wiki/Terminology)
+    * `get_org.log.txt`, the log file
 
 
 ### GetOrganelle flowchart