-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCX/RNDV: multirail - updated EP configuration #1981
Conversation
Build finished. |
Test PASSed. |
Test FAILed. |
bot:mlx:retest |
93d6eb9
to
04b7eee
Compare
Test PASSed. |
Build finished. |
Test FAILed. |
bot:ornl:retest |
Test FAILed. |
bot:mlx:retest |
Build finished. |
Test FAILed. |
04b7eee
to
cb9336b
Compare
Build finished. |
Test PASSed. |
Test FAILed. |
cb9336b
to
f6947c8
Compare
Test PASSed. |
Test PASSed. |
Build finished. |
bot:ornl:retest |
Build finished. |
@MattBBaker could you look what is wrong? |
So the test actually succeeds, but then has a hiccup on the JSON it gets from github when setting the build to success. |
bot:ornl:retest |
ok, thank you |
Build finished. |
@MattBBaker something wrong again |
bot:ornl:retest |
Build finished. |
Test PASSed. |
Test FAILed. |
- additional fix for lane config
f9fcbcd
to
22596c1
Compare
Build finished. |
Test PASSed. |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is better to remove RNDV lane and use RMA lane instead at first? Otherwise you would need to remove half of newly written code in the next PRs
src/ucp/core/ucp_context.c
Outdated
@@ -100,6 +100,10 @@ static ucs_config_field_t ucp_config_table[] = { | |||
"the eager_zcopy protocol", | |||
ucs_offsetof(ucp_config_t, ctx.rndv_perf_diff), UCS_CONFIG_TYPE_DOUBLE}, | |||
|
|||
{"MAX_RNDV_LANES", "1", | |||
"Set max multirail-get rendezvous lane numbers", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we get rid of multirail
word here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multi-rail is still here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Maximal number of devices on which a rendezvous operation may be executed in parallel."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
src/ucp/core/ucp_context.h
Outdated
@@ -53,6 +53,8 @@ typedef struct ucp_context_config { | |||
int use_mt_mutex; | |||
/** On-demand progress */ | |||
int adaptive_progress; | |||
/** Rendezvous multirail support */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multilanes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you agree with multilanes instead of multi-rail?
return ucp_ep_config(ep)->key.rndv_lanes[idx] != UCP_NULL_LANE; | ||
} | ||
|
||
static inline int ucp_ep_rndv_num_lanes(ucp_ep_h ep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better use UCS_F_ALWAYS_INLINE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all other funcs there are static inline
will add separate PR to set UCS_F_ALWAYS_INLINE
src/ucp/dt/dt.h
Outdated
@@ -41,4 +42,26 @@ typedef struct ucp_dt_state { | |||
size_t ucp_dt_pack(ucp_datatype_t datatype, void *dest, const void *src, | |||
ucp_dt_state_t *state, size_t length); | |||
|
|||
#endif | |||
static UCS_F_ALWAYS_INLINE void | |||
ucp_dt_clear_rndv_lanes(ucp_dt_state_t *state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it is worth not to mention RNDV, as the code is rather generic and is in dt.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed
src/ucp/tag/rndv.c
Outdated
@@ -340,7 +357,7 @@ static void ucp_rndv_handle_recv_contig(ucp_request_t *rndv_req, ucp_request_t * | |||
} else { | |||
if (rndv_rts_hdr->flags & UCP_RNDV_RTS_FLAG_PACKED_RKEY) { | |||
UCS_PROFILE_CALL(uct_rkey_unpack, rndv_rts_hdr + 1, | |||
&rndv_req->send.rndv_get.rkey_bundle); | |||
&rndv_req->send.rndv_get.rkey_bundle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation
- code beautify
Build finished. |
Test PASSed. |
Test PASSed. |
@brminich pls re-review |
- updated comment & help wording
Build finished. |
Test PASSed. |
Test FAILed. |
bot:mlx:retest |
Test FAILed. |
bot:mlx:retest |
Test FAILed. |
bot:mlx:retest |
Test FAILed. |
bot:mlx:retest |
Test PASSed. |
this is split PR for #1894