Releases · iquasere/reCOGnizer

12 Sep 15:26

iquasere

1.11.0

80d25f5

Databases' names changed to CD-batch search options Latest

Latest

Databases' names inputted to the --databases has changed to accomodate the options present at CDD Batch Search. The new options are:

NCBI_Curated
Pfam
SMART
KOG
COG
PRK
TIGR

Domains now follow the lists at the PN files provided by NCBI

Domains related to the NCBI_Curated and PRK databases were not all being considered when building databases. This has been fixed, in accordance to the PN files provided with cdd.tar.gz.

Database construction reimplemented to use the PN files provided by CDD

If those are not available, reCOGnizer will still build the PN files, but with more added domains.

This should fix #19. But lets see.

Also removed deprecated parameters

--download-resources and --skip-downloaded parameters now will result in error when specified.

Assets 2

30 Dec 13:10

iquasere

1.10.1

9eb6a9b

Fix on regex search of EC numbers

re.escape is required for handling the regex search where strings are being concatenated.

E.g. to consider the literal ) when searching for (1.1.1.1), in the function in question.

This problem was caused by using the new r"regex" format.

Assets 2

28 Dec 11:47

iquasere

1.10.0

1798736

Simpler download of databases and more robust COG2KO conversion

Much simpler download of databases

reCOgnizer relied on --download-resources and --skip-downloaded parameters for setting up its databases.

--download-resources instructed reCOgnizer to download the files required for its execution, and --skip-downloaded instructed it to ignore already downloaded files, if there had simply been the mistake of removing one file.

Now, reCOGnizer relies on the recognizer_dwnl.timestamp to check if databases have already been downloaded. If the file exists, it skips installation. If the file doesn't exist, reCOGnizer will remove all available files, and download everything.

COG2KO conversion more reliable

Previously, reCOGnizer built the cog2ko conversion as a collection of all KOs available for each protein mapping to the specific COG.

Now, reCOGnizer uses a similar approach to cog2ec conversion, where it will only assign a KO to a COG where over half of instances of that COG have that particular KO.

This obtains a more reliable COG2KO conversion, while keeping KOs for a considerable number of COGs.

Also removes the intermediate ssv files outputted during construction of the cog2ko database.

New parameters --test-run and --output-rpsbproc-columns will usually not be needed

--test-run parameter had to be implemented as consequence of a simpler database downloading. When set, reCOGnizer runs in an abnormal fashion, which is required for the tests at GitHub. reCOGnizer will move the cdd.tar.gz file available in the repo, and use it as a valid cdd.tar.gz file.

--output-rpsbproc-columns will output the Superfamilies, Sites, Motifs columns, which are usually empty for almost all annotations.

Removed some unnecessary files

recognizer.log was produced at working directory. It only included rpsblast outputs, mainly for error assessment. Users can obtain that information by running reCOGnizer with the --debug parameter, and manually running the faulty commands.

taxonomy.rdf was obtained as part of building taxonomy.tsv. Now, reCOgnizer removes it after it outlived its usefulness.

Some fixes

reCOGnizer was not reporting the download of files when the --quiet flag was set, except when the files had already been downloaded, and it removed them.

Also updated regexes to new format, the r'regex' format.

Assets 2

08 Nov 14:39

iquasere

1.9.4

d466aad

Fixed KOG outputting

rpsbproc doesn't work with the KOG database.
reCOGnizer's KOG report is now made directly from BLAST 6.

Assets 2

15 Sep 14:02

iquasere

1.9.3

863b089

Fix when only downloading resources

reCOGnizer wasn't properly checking if --file parameter had been imputed. Therefore, reCOGnizer still attempeted to perform annotation and searched for annotation outputs, when no --file argument was specified.

Now, it's working properly.

Assets 2

29 Aug 14:27

iquasere

1.9.2

ea82552

Custom databases workflow now multithreaded

Now works multithreaded

Removed -db parameter. Incorporated into -dbs.
--custom-database changed to --custom-databases to reflect this change.
Added input sanitization for custom/default databases. Only custom or default databases can be used at the same time.

Also some necessary changes on the tests

latest image of miniconda is not funcitonal, fixed version on 22.11.1.
Added test for custom-database-workflow.
Tests now simultaneous, instead of one at a time.

Assets 2

20 Apr 12:10

iquasere

1.9.1

113ea67

Fixed several annoyances

No more need to confirm you don't want to gunzip download resource files

If --skip-downloaded was set, reCOGnizer will both skip the downloading and gunzipping.

No more FutureWarning when trying to sum COGs

.sum(numeric_only=True) fixed that.

Assets 2

18 Apr 10:18

iquasere

1.9.0

0c51677

reCOGnizer is called without ".py"

Now called as "recognizer"

reCOGnizer was always called through the shell as recognizer.py. Now, is called with recognizer.

Now removes intermediate folders

Unused directories - tmp, rpsbproc, et al, whose files were removed, are now themselves removed.

Also, several fixes

Fixed conversion COG2KO.
Fixed future warning - xlsx_report.save() to xlsx_report.close().

Updated documentation

Added a nice interactive krona plot.
Also corrected the parameters, and talked about the taxonomy thing.

Assets 2

03 Oct 16:36

iquasere

1.8.3

b6ac206

Fix on outputting COG categories

Due to reformatting how reCOGnizer outputs information, its capacity for outputting COG categories was damaged.

It is fixed now.

Assets 2

08 Aug 19:43

iquasere

1.8.2

aea1f84

Increase maximum SMPs per database

Set option -max_smp_vol 1000000 for the makeprofiledb command.

Context: the blast package had an update, and the makeprofiledb tool now outputs a database for each 1000 HMM profiles by default.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Domains now follow the lists at the PN files provided by NCBI

Database construction reimplemented to use the PN files provided by CDD

Also removed deprecated parameters

Much simpler download of databases

COG2KO conversion more reliable

New parameters --test-run and --output-rpsbproc-columns will usually not be needed

Removed some unnecessary files

Some fixes

Also some necessary changes on the tests

No more need to confirm you don't want to gunzip download resource files

No more FutureWarning when trying to sum COGs

Now called as "recognizer"

Now removes intermediate folders

Also, several fixes

Updated documentation

Releases: iquasere/reCOGnizer

Databases' names changed to CD-batch search options

Domains now follow the lists at the PN files provided by NCBI

Database construction reimplemented to use the PN files provided by CDD

Also removed deprecated parameters

Fix on regex search of EC numbers

Simpler download of databases and more robust COG2KO conversion

Much simpler download of databases

COG2KO conversion more reliable

New parameters --test-run and --output-rpsbproc-columns will usually not be needed

Removed some unnecessary files

Some fixes

Fixed KOG outputting

Fix when only downloading resources

Custom databases workflow now multithreaded

Also some necessary changes on the tests

Fixed several annoyances

No more need to confirm you don't want to gunzip download resource files

No more FutureWarning when trying to sum COGs

reCOGnizer is called without ".py"

Now called as "recognizer"

Now removes intermediate folders

Also, several fixes

Updated documentation

Fix on outputting COG categories

Increase maximum SMPs per database