-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCT/UD: Fix completion callback not called for flush operation #7854
UCT/UD: Fix completion callback not called for flush operation #7854
Conversation
1de9461
to
bce3dea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean that some transports (specifically, UD) doesn't guarantee uct_ep_flush(LOCAL)
completion even after discard (i.e. after uct_ep_pending_purge()
+ uct_ep_flush(CANCEL)
+ uct_ep_destroy()
)?
should they guarantee flush operation completion? if yes, I guess UD (and other transports) should be fixed.
src/ucp/core/ucp_worker.c
Outdated
@@ -2515,7 +2515,7 @@ static void ucp_worker_destroy_eps(ucp_worker_h worker, | |||
ucs_debug("worker %p: destroy %s endpoints", worker, ep_type_name); | |||
ucs_list_for_each_safe(ep_ext, tmp, ep_list, ep_list) { | |||
ep = ucp_ep_from_ext_gen(ep_ext); | |||
/* Cleanup pending operations on the UCP EP before destroying it, since | |||
/* Cleanup pending operations on the UCP EP before destroying it, since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls, fix alignment and remove TAB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, unrelated change
RC an UD seem to call user flush completion from uct_ep_destroy, maybe something not fully working with UD - checking. |
Before this fix, the iface async completion queue dispatch could remove a completion desc from the queue but not call the upper layer callback. It can happen when we destroy two ud endpoints, and the queue contains completions for both of them: we would remove all completions, but call the callback only for the first destroyed endpoint. As a result, some UCP requests were not released wnd lanes are destroyed. The fix is to remove only the relevant completion descs from the queue.
bce3dea
to
70b18d5
Compare
@dmitrygx reworked the PR to fix UD, can you pls take a look (more details in commit message) |
What
Fix leak warning about ep_removed flush request (added by
ucp_wireup_send_ep_removed
, when the worker is destroyed.Why?
Fix test failures like:
How ?
Fix UD transport to not remove uct_completion_t descriptors from async queue without calling their callback