Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for processes to exit instead of STOP result #19

Merged
merged 4 commits into from
Aug 29, 2023

Conversation

yvanoers
Copy link
Contributor

The current implementation waits for all results and stops waiting based on a "STOP" signal one for each spawned worker.
This causes the main loop to wait forever if any of the subprocesses gets killed, e.g. by the OOM killer, because the STOP signal isn't sent. We ran into this problem.

This change does away with the STOP marker. The processes themselves are waited upon and while they run the results are collected.

If a worker gets killed, the remaining work gets picked up by the other workers.
However, when all workers have exited, but any one of them was killed, an exception is raised to signal not all computations may have been done.
This behavior may need to change to either
-- only raise an exception if it is sure some work was not done, or
-- terminate the other workers early because an exception will be thrown when they're done anyway.

@wouterbles
Copy link
Owner

Thanks for your contribution! Personally, I think it would make the most sense to go for option 2, so terminate all workers when one is terminated. What do you think?

@yvanoers
Copy link
Contributor Author

Ideally, if a worker gets killed, the other workers would finish the work and redo anything the killed worker was working on.
Such behavior requires some redesign though. Also, #14 may lead to changes in this regard nevertheless.

So I agree terminating the other workers early is the better option at this time.

@yvanoers
Copy link
Contributor Author

@wouterbles
I added the implementation for option 2.

@wouterbles wouterbles merged commit b4675e1 into wouterbles:main Aug 29, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants