Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch jobs do not abort when error occurs #181

Open
lvankampenhout opened this issue Nov 6, 2018 · 3 comments
Open

batch jobs do not abort when error occurs #181

lvankampenhout opened this issue Nov 6, 2018 · 3 comments
Assignees
Labels

Comments

@lvankampenhout
Copy link

Problem: whenever an error occurs somewhere down in the Python code, the batch job hangs and does not abort. When I login onto the compute note I see that there is 100% CPU usage. Not sure if this is a feature of my local cluster (I ported the scripts to SLURM cluster Cartesius) or the postprocessing scripts themselves. However it is clearly sub-optimal because the jobs need to be manually aborted.

@lvankampenhout lvankampenhout changed the title jobs do not abort when error occurred batch jobs do not abort when error occurred Nov 6, 2018
@lvankampenhout lvankampenhout changed the title batch jobs do not abort when error occurred batch jobs do not abort when error occurs Nov 6, 2018
@bertinia bertinia added the bug label Nov 7, 2018
@bertinia bertinia self-assigned this Nov 7, 2018
@bertinia
Copy link
Contributor

bertinia commented Nov 7, 2018

@lvankampenhout - is there a particular postprocessing task where the job doesn't abort correctly?

@lvankampenhout
Copy link
Author

Hi Alice, I encountered this issue with both the lnd_averages and timeseries tasks.

@lvankampenhout
Copy link
Author

Strangely enough, my jobs do abort today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants