Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work-around for dask/distributed#7726 and dask-cuda#1148 #1151

Closed
wants to merge 1 commit into from

Conversation

wence-
Copy link
Contributor

@wence- wence- commented Apr 3, 2023

The last atexit finalizer in the chain of finalizers that distributed registers needs to be one in the weakref module. Otherwise we observe hangs in the shutdown of UCX-Py where one thread has the gil but not the UCX spinlock and vice-versa.

To ensure this, unregister the known shutdown finalizers in distributed, add our weakref finalizer, and then re-register them. Along with a final re-implementation of the "python shutting down" handler which must run before other handlers in distributed.

The last atexit finalizer in the chain of finalizers that distributed
registers needs to be one in the weakref module. Otherwise we observe
hangs in the shutdown of UCX-Py where one thread has the gil but not
the UCX spinlock and vice-versa.

To ensure this, unregister the known shutdown finalizers in
distributed, add our weakref finalizer, and then re-register them.
Along with a final re-implementation of the "python shutting down"
handler which must run before other handlers in distributed.
@wence- wence- requested a review from a team as a code owner April 3, 2023 16:49
@github-actions github-actions bot added the python python code needed label Apr 3, 2023
@wence-
Copy link
Contributor Author

wence- commented Apr 3, 2023

If @randerzander reports success, I think we could try this workaround? (cc @quasiben, @pentschev)

@pentschev
Copy link
Member

Please see #1148 (comment), I think this may not be sufficient to fully resolve the issue without at least minor user intervention (assuming this problem is solely occurring in the client).

@pentschev pentschev added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Apr 3, 2023
@wence-
Copy link
Contributor Author

wence- commented Apr 27, 2023

Closing for now since it seems we don't have the necessity for this.

@wence- wence- closed this Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - DO NOT MERGE Hold off on merging; see PR for details python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants