Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pod5 filter freeze #95

Open
diego-rt opened this issue Dec 9, 2023 · 3 comments
Open

pod5 filter freeze #95

diego-rt opened this issue Dec 9, 2023 · 3 comments

Comments

@diego-rt
Copy link

diego-rt commented Dec 9, 2023

Hello,

When running pod5 filter I often get a freeze for the process in a random manner. Rerunning it usually results in successful completion. It is annoying because it does not lead to an exit code or something so the process just hangs until timeout.

This is the command:

pod5 filter ${pod5_dir} -t ${task.cpus} -r --ids filtered.channel_\${channel}.txt --missing-ok --output ./filtered.channel_\${channel}.pod5 

This is the output:

Parsed 98 reads_ids from: filtered.channel_1381.txt
terminate called without an active exception

Thanks!

@HalfPhoton
Copy link
Collaborator

Hi @diego-rt ,

We're reworking subset which is the underlying process used by filter to significantly lower resources and improve performance. Also mentioned here: #93 (comment)

We'll hopefully get this out before year end.


However to help out in the meantime:

It looks like you're running in nextflow based on the syntax of your command. I would recommend trying / exploring the following to which will hopefully improve reliability.

  • Reduce -t ${task.cpus} - this only has a small effect in filter and doesn't effect the runtime performance of filter.
  • Increase memory allocated for the task.
  • Use maxForks to limit the number of concurrent tasks.
    • Reducing the number of parallel tasks might improve stability especially if there are large number of input files as there are potentially a very large number of open file descriptors during filtering / subsetting.
  • errorStrategy: retry
    • Retry failing jobs automatically

I hope these points help in the meantime and we'll get back to you soon with an update.

Kind regards,
Rich

@diego-rt
Copy link
Author

Hi @HalfPhoton ,

Yes indeed I'm using nextflow with only one thread and 3G memory. I think the issue is that I've heavily parallelized it and have several hundred jobs simultaneously accessing the same file, which leads to some understandable I/O error. I should maybe reduce the number of forks, that's true.

But I think the main problem is the fact that the process hangs without exiting. It would be fine if it just died with an error exit code because it just would retry, but since it does not actually exit, then the process just sits there until timeout.

@HalfPhoton
Copy link
Collaborator

Yes you're absolutely correct and these changes will be incorporated to the new design of filter and subset which will be more stable for large numbers of Inputs / Outputs and scale better for use cases like your own.

Best regards,
Rich

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants