-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP:rndv multirail support infrastructure + protocol #1894
Conversation
hoopoepg
commented
Oct 6, 2017
•
edited
Loading
edited
- added infrastructure to support multi-rail on rndv-RMA:
- updated rndv-rma lane to array
- added memory handles to store registered buffer and remote keys
- added wireup configuration for mrail
- added context parameter to enable mrail (disabled by default) UCX_RNDV_RAILS=1
Test PASSed. |
Test FAILed. |
bot:mlx:retest |
Test FAILed. |
217d8e9
to
3dd33cc
Compare
Test PASSed. |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add ucp_request_send* prefix to all corresponding functions which work with request.send structure and ucp_request_[send]_state* to the functions which work with the state ( #1874 ). BTW, do we plan any mrail infra on recv (request.recv)? it potentially can be needed in case multi protocols over rails.
ok, I will rebase changes to your update
may be later |
src/ucp/core/ucp_ep.inl
Outdated
static inline int ucp_ep_is_rndv_mrail_present(ucp_ep_h ep) | ||
{ | ||
return (ucp_ep_config(ep)->key.rndv_lanes[0] != UCP_NULL_LANE && | ||
ucp_ep_config(ep)->key.rndv_lanes[1] != UCP_NULL_LANE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it enough to check just [1] element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, on some initialization there is clear only rndv_lanes[0] element (and left all other lanes uninitialized), for such cases we check 2 first lanes to be non-NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not you want to check: if have more than 1 RNDV lane -> mrail present
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lane[0] must be initialized (by NULL_LANE or real lane number), lane 1 may be uninitialized, and we have to check both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we have to initialize all RNDV lanes (it's a part of the key and you perform check on it). Then lanes[1] could be either NULL_LANE or some real lane number (which means mrail is used)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will update
src/ucp/core/ucp_request.h
Outdated
@@ -115,6 +115,12 @@ struct ucp_request { | |||
uintptr_t remote_request; /* pointer to the sender's send request */ | |||
uct_rkey_bundle_t rkey_bundle; | |||
ucp_request_t *rreq; /* receive request on the recv side */ | |||
unsigned use_mrail; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by how much does this increase request size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8 * (1+sizeof(rkey_byndle))
what is the problem here?
src/ucp/core/ucp_request.c
Outdated
@@ -175,7 +175,8 @@ void ucp_iov_buffer_memh_dereg(uct_md_h uct_md, uct_mem_h *memh, | |||
UCS_PROFILE_FUNC(ucs_status_t, ucp_request_memory_reg, | |||
(context, rsc_index, buffer, length, datatype, state), | |||
ucp_context_t *context, ucp_rsc_index_t rsc_index, void *buffer, | |||
size_t length, ucp_datatype_t datatype, ucp_dt_state_t *state) | |||
size_t length, ucp_datatype_t datatype, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need to split line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix
src/ucp/core/ucp_request.h
Outdated
@@ -196,7 +202,8 @@ void ucp_request_send_buffer_dereg(ucp_request_t *req, ucp_lane_index_t lane); | |||
|
|||
ucs_status_t ucp_request_memory_reg(ucp_context_t *context, ucp_rsc_index_t rsc_index, | |||
void *buffer, size_t length, | |||
ucp_datatype_t datatype, ucp_dt_state_t *state); | |||
ucp_datatype_t datatype, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need to split line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will join
src/ucp/core/ucp_request.inl
Outdated
|
||
static UCS_F_ALWAYS_INLINE int | ||
ucp_request_is_empty_rail(ucp_dt_state_t *state, int rail) { | ||
return state->dt.mrail[rail].memh == UCT_MEM_HANDLE_NULL || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is not just check for lane enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically we may have situation when some lanes doesn't require mem-handle, not sure if it is possible in real world, but we should handle such configuration. will look at it after wireup is complete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's why checking lane for NULL seems to be enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will fix
src/ucp/wireup/select.c
Outdated
ucs_assert(key->rndv_lane == UCP_NULL_LANE); | ||
key->rndv_lane = lane; | ||
/* TODO: add rndv sort */ | ||
ucs_assert(key->rndv_lanes[0] == UCP_NULL_LANE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will you select what lanes are suitable for RNDV?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't know for a while, this is in nearest plans :)
src/ucp/tag/rndv.c
Outdated
/* dereg the original send request since we are going to send on the AM lane next */ | ||
ucp_rndv_rma_request_send_buffer_dereg(sreq); | ||
sreq->send.state.dt.contig.memh = UCT_MEM_HANDLE_NULL; | ||
ucp_request_clear_rails(&sreq->send.state); | ||
} else if (ucp_request_have_rails(&sreq->send.state)) { | ||
ucp_rndv_rma_request_send_buffer_dereg(sreq); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not get this. What if am_lane is the same as one of the rails lane? Then you would not need to deregister send buffer, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for multirail we have to de-register buffers to avoid resource leaks - the problem is in de-registration: we can't de-register lanes 1+, and not de-register lane 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't we do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it will require deeper refactoring of request/memory registration, which affect non-rndv functionality too
src/ucp/tag/rndv.c
Outdated
(ucp_ep_get_am_lane(ep) != ucp_ep_get_rndv_get_lane(ep))) { | ||
ucp_request_clear_rails(&sreq->send.state); | ||
} else if ((ucp_ep_is_rndv_lane_present(ep, 0)) && | ||
(ucp_ep_get_am_lane(ep) != ucp_ep_get_rndv_get_lane(ep, 0))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation (thought it was not added by you:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix
src/ucp/tag/rndv.c
Outdated
@@ -288,16 +354,33 @@ UCS_PROFILE_FUNC(ucs_status_t, ucp_proto_progress_rndv_get_zcopy, (self), | |||
offset, (uintptr_t)rndv_req->send.buffer % align, | |||
(void*)rndv_req->send.buffer + offset, length); | |||
|
|||
if ((rndv_req->send.rndv_get.use_mrail) && | |||
((ucp_request_is_empty_rail(&rndv_req->send.state, rndv_req->send.rndv_get.rail_idx)) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix, but current intent is more "beautiful"
src/ucp/dt/dt.h
Outdated
@@ -24,8 +25,14 @@ typedef struct ucp_dt_state { | |||
size_t offset; /* Total offset in overall payload. */ | |||
union { | |||
struct { | |||
uct_mem_h memh; | |||
} contig; | |||
struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it is worth to have a single struct and use just one element for contig case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is logic to use same lane for am/get and using single element makes processing complicated
Test PASSed. |
6547897
to
bac9ce8
Compare
Test PASSed. |
Test PASSed. |
Test PASSed. |
Test FAILed. |
39bc4fa
to
1d6608f
Compare
Test FAILed. |
Test FAILed. |
1d6608f
to
7cb752d
Compare
bot:retest |
Test FAILed. |
Test PASSed. |
Test FAILed. |
bot:mlx:retest |
Test PASSed. |
Test FAILed. |
28f9af0
to
bdda5df
Compare
- added infrastructure to support multi-rail on rndv-RMA: - updated rndv-rma lane to array - added memory handles to store registered buffer and remote keys - added wireup configuration for mrail - added context parameter to enable mrail (disabled by default) UCX_RNDV_RAILS=1
- fixed typo - fixed indentation
- refactoring of datatypes: reduced size of core structures (request & dt)
- unified multi/single rail ways
- renamed rail->lanes
Test FAILed. |
Test FAILed. |
src/ucp/core/ucp_request.c
Outdated
@@ -344,3 +352,52 @@ ucp_request_send_start(ucp_request_t *req, ssize_t max_short, | |||
|
|||
return UCS_ERR_NO_PROGRESS; | |||
} | |||
|
|||
int ucp_request_rndv_reg(ucp_request_t *req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to ucp_request_rndv_mem_reg(), also i see the return value is not really needed so might as well make it void
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/ucp/core/ucp_request.c
Outdated
|
||
ucs_assert(UCP_DT_IS_CONTIG(req->send.datatype)); | ||
|
||
ucp_dt_clear_rndv_lanes(state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't have to clear all lanes - can set memh to NULL as an 'else' clause in line 378
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, but this infrastructure may be used for AM rndv too, and we have to add complex logic to clean handles
} | ||
|
||
req->send.reg_rsc = UCP_NULL_RESOURCE; | ||
ucp_dt_clear_rndv_lanes(state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can clear only what has been deregistered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/ucp/core/ucp_request.h
Outdated
*/ | ||
typedef struct ucp_rndv_get_rkey { | ||
ucp_lane_index_t lane_num; /* number of rkeys obtained from peer */ | ||
ucp_lane_index_t lane_idx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a comment to describe this field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
src/ucp/core/ucp_request.h
Outdated
* available - only one rkey is proceeded | ||
*/ | ||
typedef struct ucp_rndv_get_rkey { | ||
ucp_lane_index_t lane_num; /* number of rkeys obtained from peer */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need to keep "lane_num" (which should be renamed to num_lanes) and "lane_idx" in this struct, and not directly in ucp_request.send.rndv_get? this way could avoid this struct altogether:
struct {
uint64_t remote_address; /* address of the sender's data buffer */
uintptr_t remote_request; /* pointer to the sender's send request */
ucp_lane_index_t num_lanes;
ucp_lane_index_t lane_idx;
ucp_request_t *rreq; /* receive request on the recv side */
uct_rkey_bundle_t *rkey_bundle; /* rendezvous-get remote keys */
} rndv_get;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there free space to add more values?
src/ucp/wireup/select.c
Outdated
ucp_wireup_add_lane_desc(lane_descs, num_lanes_p, rsc_index, addr_index, | ||
address_list[addr_index].md_index, score, | ||
UCP_WIREUP_LANE_USAGE_RNDV, 0); | ||
if ((params->field_mask & UCP_EP_PARAM_FIELD_ERR_HANDLING_MODE) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought we disable rndv at all if err handling is enabled.. @evgeny-leksikov ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in original code rndv lane is configured always. disabled for peer handling error mode
src/ucp/wireup/select.c
Outdated
tl_bitmap, remote_md_map, 0, | ||
&rsc_index, &addr_index, &score); | ||
if (status == UCS_OK) { | ||
/* Add lane description and remove all occurrences of the remote md */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it needed to remote occurrences of remote md?
if you want to avoid multiple rndv lanes on same device, need check device name, rather than the md
src/ucp/wireup/select.c
Outdated
/* In case if selected SHM transport - leave it alone, disable multirail */ | ||
break; | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would just do 'if (status!=OK) break' and avoid the 'else'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/ucp/wireup/select.c
Outdated
ucs_qsort_r(key->rma_lanes, UCP_MAX_LANES, sizeof(ucp_lane_index_t), | ||
ucp_wireup_compare_lane_rma_score, lane_descs); | ||
ucs_qsort_r(key->amo_lanes, UCP_MAX_LANES, sizeof(ucp_lane_index_t), | ||
ucp_wireup_compare_lane_amo_score, lane_descs); | ||
|
||
/* Do not sort rndv lanes to save original sequence */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems to contradict the comment '/* Sort RNDV, RMA and AMO lanes according to score */'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment is updated
src/ucs/sys/compiler.h
Outdated
@@ -95,6 +95,12 @@ | |||
|
|||
/** | |||
* Size of statically-declared array | |||
* May be used for arrays in structure values | |||
*/ | |||
#define ucs_countof(_array) (sizeof(_array) / sizeof((_array)[0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
ucs_static_array_size is not used too, remove it?
BTW, ucs_static_array_size is failed to compile by Intel compiler (due to static assert inside)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, let's remove it
8bc7200
to
f28651d
Compare
Build finished. |
f28651d
to
07e36b9
Compare
Build finished. |
Test FAILed. |
07e36b9
to
e973ea2
Compare
Build finished. |
Test FAILed. |
- functions/values renaming, minor optimization
e973ea2
to
1d7597f
Compare
Test FAILed. |
Build finished. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
@hoopoepg can we close this? |
yes, I will make another PR |