Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Program killed during early extension; RAM issue unlikely #266

Closed
3 tasks done
marcelglueck opened this issue May 11, 2023 · 2 comments
Closed
3 tasks done

Program killed during early extension; RAM issue unlikely #266

marcelglueck opened this issue May 11, 2023 · 2 comments

Comments

@marcelglueck
Copy link

marcelglueck commented May 11, 2023

First check

  • I used the GitHub search to find a similar issue and didn't find it.
  • I searched GetOrganelle.wiki context, especially the FAQ and browsed the examples to confirm it is unexpected to happen.
  • I have updated GetOrganelle to the latest version GitHub release

Describe the bug
GetOrganelle aborts early during extension ("Killed") without more information. RAM is unlikely to be the limiting factor, as the --out-per-round is specified and 128 GB of RAM are provided.

Command executed:

get_organelle_from_reads.py -1 $read1 -2 $read2 -o ${dout}${subdout} -R 40 \
-k 21,45,65,85,105,135 -F $seedDB -t 40 --config-dir ${dbloc}${seedDB} --overwrite \
--reduce-reads-for-coverage 1000 -J 1 -M 1 --max-n-words 5E9 \
--disentangle-time-limit 5000 --out-per-round

Additional context
Compared to earlier runs (#264), I decided to increase the word limit, as it became apparent from the log files that the word limit was reached after only ~ 7 rounds of extension (requested: 40). Processing is performed on 40 cores with 128 GB of RAM. Interestingly, early abortion of extension is most often observed when the combined database "embplant_pt,embplant_mt" is used.

The following warning message is interesting, as you advised against using the separate embplant_pt and embplant_mt databases (#263):

WARNING: Multi-organelle mode (with the same data size and word size) is not suggested for organelles with divergent base-coverages.
2023-05-11 02:18:29,069 - WARNING: Please try to get different organelles in separate runs, or to use other seeds to get a better estimation of coverage values.

Can the multi-organelle mode be addressed by disabling the auto-estimation of word sizes and setting the word size to an arbitrarily low value?

Thank you for your help.

UPDATE: Now, also assemblies using only the animal_mt database fail early during extension (round 2-3). This is way earlier than with the old parameters (#264). Can it be that the current parameter selection is suboptimal, and if so, why?

Species1_log.txt
Species2_log.txt

Using the old parameters, assembly was successful for this species, but now aborts early:
Species3_log.txt

UPDATE2: The crashes of GetOrganelle produced core dump files. I inspected some of them using the file command. Here is the output:

core.1577: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/lib/spades/bin/spades-core /beegfs/work/workspace/ws/**redacted**-', real uid: 900898, effective uid: 900898, real gid: 500001, effective gid: 500001, execfn: '/usr/lib/spades/bin/spades-core', platform: 'x86_64'

core.18421: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/lib/spades/bin/spades-core /beegfs/work/workspace/ws/**redacted**-', real uid: 900898, effective uid: 900898, real gid: 500001, effective gid: 500001, execfn: '/usr/lib/spades/bin/spades-core', platform: 'x86_64'

UPDATE3: To rule out a RAM issue, I have now performed runs on the clusters's SMP queue. The command executed was:

get_organelle_from_reads.py -1 $read1 -2 $read2 -o ${dout}${subdout} -R 40 \
-k 21,45,65,85,105,135 -F $seedDB -t 14 --config-dir ${dbloc}${seedDB} \
--overwrite --reduce-reads-for-coverage inf --max-reads inf -J 1 -M 1 --max-n-words 6E9 -w 60 \
--disentangle-time-limit 15000 --out-per-round

Each job was provided 600 GB of RAM. Still they were aborted.
smp_run.log.txt

I am at my wit's end...

@JianjunJin
Copy link
Collaborator

JianjunJin commented May 26, 2023

If this thread is not miscellaneous, it will be more helpful to everyone. Please avoid posting issues like this. One-issue-one-theme will make questions clear and save time.
Sorry that I might be light-headed this afternoon. It made me feel like this thread is miscellaneous, now I'm more clear with the information, still overwhelming though.

  • I'm not specialized in hardware/running environment issues. I don't get a clue here. According to the Species1_log.txt, the peak memory is only 13.437 G. So I agree that it is not a memory issue. Do you get any information or help from your cluster administrator? I would like to learn.

  • "Multi-organelle mode is not suggested" is true. It means assembling multiple organelles in one run, which was a bad idea and will be deprecated in the next release. One should never use it. However, using combined databases (e.g. emplant_pt+emplant_mt) for assembling one organelle (e.g. emplant_pt), is totally a different thing. This is because pt and mt usually shared (sometimes a lot) similar sequences due to e.g. HGT. Using the combined databases was used to separate them better (see GetOrganelle paper for more). By default, once databases have been initialized, -F embplant_pt will trigger the usage of both emplant_pt and embplant_mt.

  • The current automatic parameter selection for animal_mt can frequently be terrible, as indicated in the homepage instruction see word size part and not limited to the word size. This is mainly because we are plant guys and care too much about plants over others. Upon publishing, it was good enough compared to available tools, but definitely, plenty of room to improve for animal and fungal communities (e.g. animal_nr #136 ). I want to make it catch up with people's needs after I finish my current postdoc projects, which are interesting but barely related to organelle genomes.

@marcelglueck
Copy link
Author

Thank you for your detailed help. I have now consulted the cluster's admin and we figured out, that it was an out-of-memory-error. This issue was caused by trying to reserve multiple nodes on the HPC. However, if this is the case, PBS ignores the memory requested and allocates whatever memory is left on the nodes. This resulted in the processes to run out of memory and abort.

Thanks also for the providing additional assistance with the databases. I now left the multi-organelle mode behind. The assemblies performed all worked fine and I am impressed how well the animal_mt assemblies worked out with minor adjustments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants