-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/NBX: fixed external request free from CB #5998
UCP/NBX: fixed external request free from CB #5998
Conversation
src/ucp/core/ucp_request.inl
Outdated
(_req)->status = (_status); \ | ||
if (ucs_likely((_req)->flags & UCP_REQUEST_FLAG_CALLBACK)) { \ | ||
(_req)->_cb((_req) + 1, (_status), ## __VA_ARGS__); \ | ||
} \ | ||
if (ucs_unlikely(((_req)->flags |= UCP_REQUEST_FLAG_COMPLETED) & \ | ||
if (ucs_unlikely(!external && \ | ||
((_req)->flags |= UCP_REQUEST_FLAG_COMPLETED) & \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space before |=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/ucp/rma/rma_send.c
Outdated
@@ -171,7 +171,8 @@ ucp_rma_nonblocking(ucp_ep_h ep, const void *buffer, size_t length, | |||
{return UCS_STATUS_PTR(UCS_ERR_NO_MEMORY);}); | |||
|
|||
status = ucp_rma_request_init(req, ep, buffer, length, remote_addr, rkey, | |||
progress_cb, zcopy_thresh, 0); | |||
progress_cb, zcopy_thresh, | |||
req->flags & UCP_REQUEST_FLAG_EXTERNAL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
if it is already set, no need to reset..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ucp_request_get_param just select way how to get request (allocate or use from params) + set flag EXTERNAL,
ucp_rma_request_init doesn't respect any flags from request - here fixed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need to override request? It looks weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also req->flags
may be undefined for internal request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set it to 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
req->flags is already initialized, why need to set it again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there added more flags
src/ucp/tag/tag_send.c
Outdated
@@ -292,7 +292,7 @@ UCS_PROFILE_FUNC(ucs_status_ptr_t, ucp_tag_send_nbx, | |||
datatype, contig_length, param); | |||
} else { | |||
ucp_tag_send_req_init(req, ep, buffer, datatype, memory_type, count, | |||
tag, 0); | |||
tag, req->flags & UCP_REQUEST_FLAG_EXTERNAL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need reinit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not re-init, req_init fills all values in request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
req->flags is already initialized, why need to set it again? can just do |=
in req_init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there added more flags
src/ucp/core/ucp_request.inl
Outdated
@@ -64,11 +64,13 @@ | |||
|
|||
#define ucp_request_complete(_req, _cb, _status, ...) \ | |||
{ \ | |||
uint32_t external = (_req)->flags & UCP_REQUEST_FLAG_EXTERNAL; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__external
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed
test/gtest/ucp/test_ucp_tag.cc
Outdated
}; | ||
|
||
const size_t test_ucp_tag_fallback::MSG_SIZE = 4 * 1024 * ucs_get_page_size(); | ||
const size_t test_ucp_tag_nbx::MSG_SIZE = 4 * 1024 * ucs_get_page_size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 * UCS_KBYTE * ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/ucp/core/ucp_request.inl
Outdated
@@ -94,6 +96,7 @@ | |||
} \ | |||
} else { \ | |||
__req = ((ucp_request_t*)(_param)->request) - 1; \ | |||
__req->flags |= UCP_REQUEST_FLAG_EXTERNAL; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
align by =
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aligned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to clear this flag (or init to 0) if it is internal request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, set to 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where? Also why do you do |=
? User may not touch UCX part of request at all, so req->flags may contain garbage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-pushed
src/ucp/core/ucp_request.h
Outdated
#if UCS_ENABLE_ASSERT | ||
UCP_REQUEST_FLAG_STREAM_RECV = UCS_BIT(18), | ||
UCP_REQUEST_DEBUG_FLAG_EXTERNAL = UCS_BIT(19) | ||
UCP_REQUEST_FLAG_STREAM_RECV = UCS_BIT(19), | ||
#else | ||
UCP_REQUEST_FLAG_STREAM_RECV = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for trailing ,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/ucp/core/ucp_request.h
Outdated
#if UCS_ENABLE_ASSERT | ||
UCP_REQUEST_FLAG_STREAM_RECV = UCS_BIT(18), | ||
UCP_REQUEST_DEBUG_FLAG_EXTERNAL = UCS_BIT(19) | ||
UCP_REQUEST_FLAG_STREAM_RECV = UCS_BIT(19), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for trailing ,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, fixed
@dmitrygx, what is the case when |
But where we set |
it seems here |
/azp run |
bot:pipe:retest |
Azure Pipelines successfully started running 1 pipeline(s). |
timed out |
src/ucp/core/ucp_request.inl
Outdated
uint32_t _external = ((_req)->flags |= UCP_REQUEST_FLAG_COMPLETED) & \ | ||
UCP_REQUEST_FLAG_EXTERNAL; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmitrygx as you wish
f7e6def
to
addbfff
Compare
completely re-implemented fix, squashed due to full re-implementation |
src/ucp/core/ucp_request.inl
Outdated
@@ -64,12 +64,15 @@ | |||
|
|||
#define ucp_request_complete(_req, _cb, _status, ...) \ | |||
{ \ | |||
/* NOTE: we have to store "RELEASED" flag here to provide backward */ \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ucp_request_release function doesn't cancel completion call, to support such behavior we have to store "RELEASED" flag here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated comment
addbfff
to
6170d08
Compare
@hoopoepg test failure seems relevant
|
6170d08
to
1360f39
Compare
yep, issue was in tag-offload buffer dereg. |
@yosefe ok to merge? |
/* Cancel req in transport if it was offloaded, because it arrived | ||
as unexpected */ | ||
ucp_tag_offload_try_cancel(worker, rreq, UCP_TAG_OFFLOAD_CANCEL_FORCE); | ||
ucp_tag_rndv_matched(worker, rreq, rts_hdr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does it fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ucp_tag_rndv_matched
may free request in self transport & access to request became impossible. fix is cancel tag offload buffer first (it is not used in expected recv) and then process rndv operation.
this fix is proposed by @brminich
- fixed crash in completion callback when user is tried to free external request - added gtest - added function wait_for_value for UCP tests
1360f39
to
1795c1f
Compare
bot:pipe:retest |
@yosefe ok to merge? |
external request
UCP_REQUEST_DEBUG_FLAG_EXTERNAL)
fixes #5991