Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make database paths configurable - add GetOrganelle to Galaxy #64

Closed
bernt-matthias opened this issue Dec 26, 2020 · 20 comments
Closed

Comments

@bernt-matthias
Copy link

Seems to me that the databases are currently stored next to the library. This does not work for read-only installations (e.g. containers and multi user installations) and is considered bad practice for conda installations (even if writable).

Ideally this could be done via an environment variable of command line parameter.

@Kinggerm
Copy link
Owner

How about making the databases or a file storing the path under "~/.GetOrganelle" by default?

@bernt-matthias
Copy link
Author

Thanks for the feedback.

I would prefer a command line parameter (~/.GetOrganelle might be a good default). Only allowing ~/.GetOrganelle would not be helpful for multi user installations, where an admin might want to provide the data bases at a central location for all users.

@Kinggerm
Copy link
Owner

In the latest update (version 1.7.3), sort by priority,

  1. The database for each single run can be customized using a command line parameter following the flag "--config-dir".
  2. If "--config-dir" was not set, it will look for the shell environment value GETORG_PATH. So the admin could set a global default for all users.
  3. If GETORG_PATH was not set, the default is "~/.GetOrganelle".

@Kinggerm
Copy link
Owner

Kinggerm commented Jan 20, 2021

I just noticed that this is for Galaxy project, my collaborator @wbyu has mentioned many times of adding GetOrganelle to Galaxy. We are very interested in contributing. Please let me know if there's anything else we can help.

@bernt-matthias
Copy link
Author

bernt-matthias commented Jan 20, 2021

Good to know :) Indeed some "small" test data and an example command line would simplify creating such a tool a bit.

@Kinggerm
Copy link
Owner

We have a simulated mini-data along with the command:
Example 1/2 (https://github.com/Kinggerm/GetOrganelle/wiki/Examples)

and a few real test data (also very small ones):
Example 3/4/5 (https://github.com/Kinggerm/GetOrganelle/wiki/Examples)

@bernt-matthias
Copy link
Author

Downloaded reference data with python get_organelle_config.py -a all --config-dir config

Trying to get the examples running: python get_organelle_from_reads.py ... --config-dir config/. This gives me:

############################################################################
ERROR: /home/berntm/.GetOrganelle/SeedDatabase/embplant_pt.fasta not found!

I'm also wondering if you could switch from optparse (which is deprecated) to argparse? For argparse I could auto-generate Galaxy wrappers. For get_organelle_from_reads I started with the conversion to argparse - if you like I could open a PR.

@Kinggerm
Copy link
Owner

Kinggerm commented Feb 11, 2021

Thanks for the feedback. Sorry about the remaining issues - now I believe I have fixed it and tested it in different ways.
Please find the latest version at github.

Sure. Please branch out from the latest master if you haven't started the conversion. Thanks again!

@Kinggerm
Copy link
Owner

@bernt-matthias
I just made several updates and fixes to a new GetOrganelle version, in which I switched from optparse to argparse.
It's currently on a different branch from the master: https://github.com/Kinggerm/GetOrganelle/tree/update_assembly_with_variable_overlaps

@bernt-matthias
Copy link
Author

@Kinggerm this looks great :)

@Kinggerm Kinggerm changed the title Make database paths configurable Make database paths configurable - add GetOrganelle to Galaxy Apr 15, 2021
@Kinggerm
Copy link
Owner

1.7.4 is now formally released with all above requirements fulfilled.

@bernt-matthias
Copy link
Author

Excellent. I hope that I find the time to wrap this for Galaxy any time soon.

@Kinggerm
Copy link
Owner

Kinggerm commented Oct 8, 2021

@bernt-matthias
Hi, do you have further updates?

@bernt-matthias
Copy link
Author

@bernt-matthias
Copy link
Author

Thanks for the reference to the small test data. Is there also a small seed (and label) database? For the IUC Galaxy tool repo we are aiming at <1MB per test file.

@Kinggerm
Copy link
Owner

Just back from traveling. Sure, I can prepare a small seed and label database for it ASAP. I will keep you updated.

@bernt-matthias
Copy link
Author

Hi @Kinggerm any news on small seed and label databases?

@Kinggerm
Copy link
Owner

Kinggerm commented May 6, 2022

Hi @bernt-matthias, I created a minimal dataset derived from 0.0.1, named it 0.0.1.minima, and uploaded it to https://github.com/Kinggerm/GetOrganelleDB. It is downloadable through https://github.com/Kinggerm/GetOrganelleDB/releases/download/0.0.1.minima/v0.0.1.minima.tar.gz.
Please also update GetOrganelle to 1.7.6.1 to use it smoothly, otherwise lower GetOrganelle versions have to manually download the files and use get_organelle_config.py --use-local to add the database.

This 0.0.1.minima is, in total, 1.1MB uncompressed with the seed-label pair of each organelle type as follows:

type size note
animal_mt 36KB
embplant_pt+embplant_mt 684KB These two types must be used together
embplant_nr 16KB
fungus_mt 84KB
fungus_nr 8KB
other_pt 328KB

However, please note that the local files will inflate a lot as you format them into bowtie2- & blastn- indices.

Please let me know if these make sense.

@bernt-matthias
Copy link
Author

Thanks a lot. We will try to use those for the galaxy tool wrapper tests.

@bernt-matthias
Copy link
Author

Seems that we can close this issue. Thanks for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants