-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/RMA: Report the error if doing RMA/AMO for non-HOST memory #4427
Conversation
Mellanox CI: FAILED on 9 of 25 workers (click for details)Note: the logs will be deleted after 18-Nov-2019
|
c37ffa4
to
47807d0
Compare
bot:pipe:retest |
Mellanox CI: FAILED on 25 of 25 workers (click for details)Note: the logs will be deleted after 19-Nov-2019
|
47807d0
to
0e7f9d9
Compare
Mellanox CI: FAILED on 5 of 25 workers (click for details)Note: the logs will be deleted after 19-Nov-2019
|
Mellanox CI: FAILED on 3 of 25 workers (click for details)Note: the logs will be deleted after 20-Nov-2019
|
0e7f9d9
to
020578e
Compare
Mellanox CI: FAILED on 1 of 1 workers (click for details)Note: the logs will be deleted after 20-Nov-2019
|
020578e
to
a8e284a
Compare
a8e284a
to
3b24177
Compare
Mellanox CI: FAILED on 2 of 25 workers (click for details)Note: the logs will be deleted after 20-Nov-2019
|
Mellanox CI: PASSED on 25 workers (click for details)Note: the logs will be deleted after 20-Nov-2019
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic change of checking rma inline seems broken
src/tools/perf/lib/ucp_tests.cc
Outdated
@@ -180,7 +180,9 @@ class ucp_perf_test_runner { | |||
send_started(); | |||
return UCS_OK; | |||
case UCX_PERF_CMD_PUT: | |||
*((uint8_t*)buffer + length - 1) = sn; | |||
m_perf.allocator->memcpy((psn_t*)buffer + length - 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there a bunch of perftest fixes/changes in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they were implemented to check that error is printed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
src/ucp/rma/rma_send.c
Outdated
@@ -195,6 +204,12 @@ ucp_rma_nonblocking_cb(ucp_ep_h ep, const void *buffer, size_t length, | |||
return ucp_rma_send_request_cb(req, cb); | |||
} | |||
|
|||
int ucp_rma_put_is_inline(size_t length, ucp_rkey_h rkey) | |||
{ | |||
return (ucs_likely((rkey->cache.max_put_short > 0) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after this change there are 2 branches instead of 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/ucp/core/ucp_ep.h
Outdated
* Thresholds with and without non-host memory | ||
*/ | ||
typedef struct ucp_memtype_thresh { | ||
ssize_t memtype_on; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an original indent
fixed
src/ucp/core/ucp_ep.inl
Outdated
{ | ||
if (ucs_likely(max_short->memtype_off > 0)) { | ||
return max_short->memtype_off; | ||
} else if (ucp_memory_type_cache_is_empty(ep->worker->context)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the cache became non-empty after rkey unpack and before ucp_put?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch,
fixed
src/tools/perf/cuda/cuda_alloc.c
Outdated
mem_map_params.field_mask |= UCP_MEM_MAP_PARAM_FIELD_FLAGS; | ||
} | ||
|
||
status = ucp_mem_map(perf->ucp.context, &mem_map_params, memh_p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we doing mem map if RMA is not supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was done to check that we can report an error from UCP RMA operations when trying to do them for non-Host memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
3b24177
to
392e09a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/tools/perf/cuda/cuda_alloc.c
Outdated
mem_map_params.field_mask |= UCP_MEM_MAP_PARAM_FIELD_FLAGS; | ||
} | ||
|
||
status = ucp_mem_map(perf->ucp.context, &mem_map_params, memh_p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was done to check that we can report an error from UCP RMA operations when trying to do them for non-Host memory
src/tools/perf/lib/ucp_tests.cc
Outdated
@@ -180,7 +180,9 @@ class ucp_perf_test_runner { | |||
send_started(); | |||
return UCS_OK; | |||
case UCX_PERF_CMD_PUT: | |||
*((uint8_t*)buffer + length - 1) = sn; | |||
m_perf.allocator->memcpy((psn_t*)buffer + length - 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they were implemented to check that error is printed
src/ucp/core/ucp_ep.h
Outdated
* Thresholds with and without non-host memory | ||
*/ | ||
typedef struct ucp_memtype_thresh { | ||
ssize_t memtype_on; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an original indent
fixed
src/tools/perf/cuda/cuda_alloc.c
Outdated
mem_map_params.field_mask |= UCP_MEM_MAP_PARAM_FIELD_FLAGS; | ||
} | ||
|
||
status = ucp_mem_map(perf->ucp.context, &mem_map_params, memh_p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
src/tools/perf/lib/ucp_tests.cc
Outdated
@@ -180,7 +180,9 @@ class ucp_perf_test_runner { | |||
send_started(); | |||
return UCS_OK; | |||
case UCX_PERF_CMD_PUT: | |||
*((uint8_t*)buffer + length - 1) = sn; | |||
m_perf.allocator->memcpy((psn_t*)buffer + length - 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
src/ucp/core/ucp_ep.inl
Outdated
{ | ||
if (ucs_likely(max_short->memtype_off > 0)) { | ||
return max_short->memtype_off; | ||
} else if (ucp_memory_type_cache_is_empty(ep->worker->context)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch,
fixed
src/ucp/rma/rma_send.c
Outdated
@@ -195,6 +204,12 @@ ucp_rma_nonblocking_cb(ucp_ep_h ep, const void *buffer, size_t length, | |||
return ucp_rma_send_request_cb(req, cb); | |||
} | |||
|
|||
int ucp_rma_put_is_inline(size_t length, ucp_rkey_h rkey) | |||
{ | |||
return (ucs_likely((rkey->cache.max_put_short > 0) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Mellanox CI: FAILED on 22 of 25 workers (click for details)Note: the logs will be deleted after 22-Nov-2019
|
392e09a
to
985cfd4
Compare
Mellanox CI: FAILED on 1 of 1 workers (click for details)Note: the logs will be deleted after 25-Nov-2019
|
Mellanox CI: UNKNOWN on 25 workers (click for details)Note: the logs will be deleted after 25-Nov-2019
|
ecd72a6
to
cacdf20
Compare
Mellanox CI: FAILED on 17 of 25 workers (click for details)Note: the logs will be deleted after 25-Nov-2019
|
@brminich @Akshay-Venkatesh @bureddy could you review pls? |
src/ucp/core/ucp_ep.inl
Outdated
ssize_t length) | ||
{ | ||
return (ucs_likely(length <= max_short->memtype_off) || | ||
(length <= max_short->memtype_on && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
( )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/ucp/core/ucp_rkey.c
Outdated
@@ -19,6 +19,9 @@ | |||
#include <inttypes.h> | |||
|
|||
|
|||
const ucp_memtype_thresh_t ucp_rma_sw_max_put_short = {-1, -1}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed this
done
src/ucp/core/ucp_ep.inl
Outdated
{ | ||
return (ucs_likely(length <= max_short->memtype_off) || | ||
(length <= max_short->memtype_on && | ||
ucp_memory_type_cache_is_empty(ep->worker->context))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we said ucp_memory_type_cache_is_empty() is not really a good test because we are adding the UNKNOWN regions.. so better remote it instead of having more logic depend on it
need just one max_short. it will be a real number of no mem_type_mds, and -1 if there is at least 1 mem type md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
checking ucs_likely(length <= max_short->memtype_off)
would be enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to add memtype on/off for rma
keep one max_short as it was before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to add memtype on/off for rma
okay, I see now
btw, do we need the same for TAG-matching proto? we could remove memtype on/off for it as well and use max_short
that will be -1
in case of there is at least one mem_type MD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we talked about removing it for tag matching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we talked about removing it for tag matching
let's remove it in this PR? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better not: let's keep this PR focused on RMA memtype handling, and cleanup in another PR
src/ucp/rma/rma_send.c
Outdated
req->send.mem_type = ucp_memory_type_detect(ep->worker->context, | ||
buffer, length); | ||
if (ucs_unlikely((req->send.mem_type != UCS_MEMORY_TYPE_HOST) || | ||
(rkey->mem_type != UCS_MEMORY_TYPE_HOST))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remote memory could be non host in case of GPU direct.. if we got the rkey for, it means it was registered on remote side so it's accessible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remote memory could be non host in case of GPU direct.. if we got the rkey for, it means it was registered on remote side so it's accessible
rkey->mem_type
can be non-HOST (e.g. UCS_MEMORY_TYPE_CUDA
) in case of UCX_TLS=tcp,cuda_copy
do you mean that we have to check md_map
to understand whether it was registered on network TL (e.g. IB using GPUDirect) or not?
If we see that this memory (local or remote) was registered using GPUDirect, then we could support UCP RMA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for tcp case, it's different, it's sw-rma protocol which indeed works only on host memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for tcp case, it's different, it's sw-rma protocol which indeed works only on host memory
yes
but if we remove checking remote memory type, then we break doung an RMA operation for H->C
case, when GPUDirect isn't supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so rma_basic protocol should not check remote memtype, only rma_sw should check remote mem type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so rma_basic protocol should not check remote memtype, only rma_sw should check remote mem type
@yosefe thank you for the clarification!
this is common code for both basic and sw RMA, will move a check for remote memory type to sw RMA only
test/gtest/common/mem_buffer.cc
Outdated
std::vector<ucs_memory_type_t> mem_types = mem_buffer::supported_mem_types(); | ||
std::vector<std::pair<ucs_memory_type_t, ucs_memory_type_t> > pairs; | ||
|
||
for (std::vector<ucs_memory_type_t>::const_iterator i = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function should not be in mem_buffer, have some other helper to construct cartesian multiplication
test/gtest/common/mem_buffer.cc
Outdated
mem_buffer::supported_mem_type_pairs() | ||
{ | ||
std::vector<ucs_memory_type_t> mem_types = mem_buffer::supported_mem_types(); | ||
std::vector<std::pair<ucs_memory_type_t, ucs_memory_type_t> > pairs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add typedef for std::pair<ucs_memory_type_t, ucs_memory_type_t>
test/gtest/ucp/test_ucp_rma.cc
Outdated
class test_ucp_rma_basic : public test_ucp_memheap { | ||
public: | ||
void init() { | ||
std::vector<std::pair<ucs_memory_type_t, ucs_memory_type_t> > mem_type_pairs = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
save this in some class-static (global) varaible
test/gtest/ucp/test_ucp_rma.cc
Outdated
|
||
void check_rma_op_status(ucs_status_t status) const { | ||
if ((m_local_mem_type != UCS_MEMORY_TYPE_HOST) || | ||
(m_remote_mem_type != UCS_MEMORY_TYPE_HOST)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should not check remote mem type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not check remote memory type in case of GPUDirect only
otherwise (non-GPUDirect TL), a target side could register CUDA buffer and give it to an RMA initiator - but RMA initiator has to fail such transfer as it unsupported by UCP RMA, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if target does not support GPU-D it would not have registered the memory with IB
8b706c0
to
cd91605
Compare
src/ucp/core/ucp_rkey.c
Outdated
void ucp_rkey_resolve_inner(ucp_rkey_h rkey, ucp_ep_h ep) | ||
/* If we use sw rma/amo need to resolve destination endpoint in order to | ||
* receive responses and completion messages */ | ||
static void ucp_rkey_resolve_desp_ep_ptr(ucp_ep_h ep, ucp_lane_index_t *lane) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolve_desT
_ep_ptr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/ucp/rma/rma_send.c
Outdated
if (ucs_unlikely(status != UCS_OK)) { | ||
return UCS_STATUS_PTR(status); | ||
status_ptr = UCS_STATUS_PTR(status); | ||
goto err; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can put and return here, goto looks excessive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/ucp/rma/rma_send.c
Outdated
UCP_REQUEST_FLAG_RELEASED); | ||
if (ucs_unlikely(status != UCS_OK)) { | ||
return status; | ||
goto err; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can put and return here, goto looks excessive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
test/gtest/common/mem_buffer.h
Outdated
@@ -19,6 +19,7 @@ | |||
*/ | |||
class mem_buffer { | |||
public: | |||
/* get all supported memory types */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
method name is quite self-explaining :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Mellanox CI: FAILED on 4 of 25 workers (click for details)Note: the logs will be deleted after 16-Dec-2019
|
failures are relevant:
|
eddf013
to
be4a1c9
Compare
Mellanox CI: PASSED on 25 workers (click for details)Note: the logs will be deleted after 17-Dec-2019
|
be4a1c9
to
a5a05fd
Compare
Mellanox CI: PASSED on 25 workers (click for details)Note: the logs will be deleted after 23-Dec-2019
|
a5a05fd
to
7b8e7fe
Compare
Mellanox CI: PASSED on 25 workers (click for details)Note: the logs will be deleted after 23-Dec-2019
|
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
7b8e7fe
to
ee55de8
Compare
Mellanox CI: FAILED on 2 of 25 workers (click for details)Note: the logs will be deleted after 31-Dec-2019
|
bot:mlx:retest |
Mellanox CI: FAILED on 2 of 25 workers (click for details)Note: the logs will be deleted after 31-Dec-2019
|
test node #3 lab issue |
Mellanox CI: FAILED on 4 of 25 workers (click for details)Note: the logs will be deleted after 02-Jan-2020
|
Mellanox CI: FAILED on 2 of 25 workers (click for details)Note: the logs will be deleted after 19-Jan-2020
|
What
Report the error and fail RMA operation if a user tries to do RMA for non-HOST memory
after this change, it reports (e.g. from
ucx_perftest -t ucp_put_bw -m cuda
):Why ?
Fixes #4416 (comment)
How ?
Call
ucp_memory_type_detect_mds
and