Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP/RMA: Report the error if doing RMA/AMO for non-HOST memory #4427

Closed

Conversation

dmitrygx
Copy link
Member

@dmitrygx dmitrygx commented Nov 11, 2019

What

Report the error and fail RMA operation if a user tries to do RMA for non-HOST memory

after this change, it reports (e.g. from ucx_perftest -t ucp_put_bw -m cuda):

[1573548981.748373] [hpc-test-node-gpu02:22308:0]       rma_send.c:135  UCX  ERROR UCP doesn't support RMA for "cuda" memory type
[1573548981.748377] [hpc-test-node-gpu02:22308:0]          rma.inl:45   UCX  WARN  put failed: Invalid parameter

Why ?

Fixes #4416 (comment)

How ?

Call ucp_memory_type_detect_mds and

src/ucp/rma/rma_send.c Outdated Show resolved Hide resolved
@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 9 of 25 workers (click for details)

Note: the logs will be deleted after 18-Nov-2019

Agent/Stage Status
_main ❌ FAILURE
hpc-arm-hwi-jenkins_W0 ❌ FAILURE
hpc-arm-hwi-jenkins_W1 ❌ FAILURE
hpc-arm-hwi-jenkins_W2 ❌ FAILURE
hpc-arm-hwi-jenkins_W3 ❌ FAILURE
hpc-test-node-legacy_W0 ❌ FAILURE
hpc-test-node-legacy_W1 ❌ FAILURE
hpc-test-node-legacy_W3 ❌ FAILURE
hpc-test-node-new_W0 ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@dmitrygx dmitrygx force-pushed the topic/ucp/rma_detect_mem_type branch from c37ffa4 to 47807d0 Compare November 12, 2019 09:05
@dmitrygx
Copy link
Member Author

bot:pipe:retest

@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 25 of 25 workers (click for details)

Note: the logs will be deleted after 19-Nov-2019

Agent/Stage Status
_main ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ❌ FAILURE
hpc-arm-cavium-jenkins_W1 ❌ FAILURE
hpc-arm-cavium-jenkins_W2 ❌ FAILURE
hpc-arm-cavium-jenkins_W3 ❌ FAILURE
hpc-arm-hwi-jenkins_W0 ❌ FAILURE
hpc-arm-hwi-jenkins_W1 ❌ FAILURE
hpc-arm-hwi-jenkins_W2 ❌ FAILURE
hpc-arm-hwi-jenkins_W3 ❌ FAILURE
hpc-test-node-gpu_W0 ❌ FAILURE
hpc-test-node-gpu_W1 ❌ FAILURE
hpc-test-node-gpu_W2 ❌ FAILURE
hpc-test-node-gpu_W3 ❌ FAILURE
hpc-test-node-legacy_W0 ❌ FAILURE
hpc-test-node-legacy_W1 ❌ FAILURE
hpc-test-node-legacy_W2 ❌ FAILURE
hpc-test-node-legacy_W3 ❌ FAILURE
hpc-test-node-new_W0 ❌ FAILURE
hpc-test-node-new_W1 ❌ FAILURE
hpc-test-node-new_W2 ❌ FAILURE
hpc-test-node-new_W3 ❌ FAILURE
r-vmb-ppc-jenkins_W0 ❌ FAILURE
r-vmb-ppc-jenkins_W1 ❌ FAILURE
r-vmb-ppc-jenkins_W2 ❌ FAILURE
r-vmb-ppc-jenkins_W3 ❌ FAILURE

@dmitrygx dmitrygx force-pushed the topic/ucp/rma_detect_mem_type branch from 47807d0 to 0e7f9d9 Compare November 12, 2019 17:58
@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 5 of 25 workers (click for details)

Note: the logs will be deleted after 19-Nov-2019

Agent/Stage Status
_main ❓ ABORTED
hpc-arm-cavium-jenkins_W2 ❓ ABORTED
hpc-arm-cavium-jenkins_W3 ❓ ABORTED
hpc-arm-hwi-jenkins_W2 ❓ ABORTED
hpc-test-node-gpu_W1 ❓ ABORTED
hpc-test-node-gpu_W2 ❓ ABORTED
hpc-test-node-legacy_W0 ❓ ABORTED
hpc-test-node-legacy_W2 ❓ ABORTED
hpc-arm-hwi-jenkins_W1 ❌ FAILURE
hpc-test-node-gpu_W0 ❌ FAILURE
hpc-test-node-gpu_W3 ❌ FAILURE
hpc-test-node-legacy_W1 ❌ FAILURE
hpc-test-node-new_W1 ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W3 ❓ UNKNOWN

@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 3 of 25 workers (click for details)

Note: the logs will be deleted after 20-Nov-2019

Agent/Stage Status
_main ❌ FAILURE
hpc-test-node-gpu_W0 ❌ FAILURE
hpc-test-node-gpu_W3 ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@dmitrygx dmitrygx force-pushed the topic/ucp/rma_detect_mem_type branch from 0e7f9d9 to 020578e Compare November 13, 2019 07:43
@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 1 of 1 workers (click for details)

Note: the logs will be deleted after 20-Nov-2019

Agent/Stage Status
_main ❌ FAILURE

@dmitrygx dmitrygx force-pushed the topic/ucp/rma_detect_mem_type branch from 020578e to a8e284a Compare November 13, 2019 07:49
@dmitrygx dmitrygx force-pushed the topic/ucp/rma_detect_mem_type branch from a8e284a to 3b24177 Compare November 13, 2019 10:48
@dmitrygx
Copy link
Member Author

@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 2 of 25 workers (click for details)

Note: the logs will be deleted after 20-Nov-2019

Agent/Stage Status
_main ❓ ABORTED
hpc-test-node-legacy_W0 ❓ ABORTED
hpc-test-node-new_W2 ❓ ABORTED
hpc-test-node-gpu_W0 ❌ FAILURE
hpc-test-node-gpu_W3 ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@dmitrygx
Copy link
Member Author

@brminich @hoopoepg could you review pls?

@mellanox-github
Copy link
Contributor

Mellanox CI: PASSED on 25 workers (click for details)

Note: the logs will be deleted after 20-Nov-2019

Agent/Stage Status
_main ✔️ SUCCESS
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

Copy link
Contributor

@yosefe yosefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic change of checking rma inline seems broken

@@ -180,7 +180,9 @@ class ucp_perf_test_runner {
send_started();
return UCS_OK;
case UCX_PERF_CMD_PUT:
*((uint8_t*)buffer + length - 1) = sn;
m_perf.allocator->memcpy((psn_t*)buffer + length - 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a bunch of perftest fixes/changes in this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they were implemented to check that error is printed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -195,6 +204,12 @@ ucp_rma_nonblocking_cb(ucp_ep_h ep, const void *buffer, size_t length,
return ucp_rma_send_request_cb(req, cb);
}

int ucp_rma_put_is_inline(size_t length, ucp_rkey_h rkey)
{
return (ucs_likely((rkey->cache.max_put_short > 0) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after this change there are 2 branches instead of 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

src/ucp/tag/tag_send.c Outdated Show resolved Hide resolved
* Thresholds with and without non-host memory
*/
typedef struct ucp_memtype_thresh {
ssize_t memtype_on;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an original indent
fixed

{
if (ucs_likely(max_short->memtype_off > 0)) {
return max_short->memtype_off;
} else if (ucp_memory_type_cache_is_empty(ep->worker->context)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the cache became non-empty after rkey unpack and before ucp_put?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch,
fixed

mem_map_params.field_mask |= UCP_MEM_MAP_PARAM_FIELD_FLAGS;
}

status = ucp_mem_map(perf->ucp.context, &mem_map_params, memh_p);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we doing mem map if RMA is not supported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was done to check that we can report an error from UCP RMA operations when trying to do them for non-Host memory

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@dmitrygx dmitrygx force-pushed the topic/ucp/rma_detect_mem_type branch from 3b24177 to 392e09a Compare November 15, 2019 18:16
Copy link
Member Author

@dmitrygx dmitrygx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe @brminich force-pushed changes as it's a completely new implementation

mem_map_params.field_mask |= UCP_MEM_MAP_PARAM_FIELD_FLAGS;
}

status = ucp_mem_map(perf->ucp.context, &mem_map_params, memh_p);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was done to check that we can report an error from UCP RMA operations when trying to do them for non-Host memory

@@ -180,7 +180,9 @@ class ucp_perf_test_runner {
send_started();
return UCS_OK;
case UCX_PERF_CMD_PUT:
*((uint8_t*)buffer + length - 1) = sn;
m_perf.allocator->memcpy((psn_t*)buffer + length - 1,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they were implemented to check that error is printed

* Thresholds with and without non-host memory
*/
typedef struct ucp_memtype_thresh {
ssize_t memtype_on;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an original indent
fixed

mem_map_params.field_mask |= UCP_MEM_MAP_PARAM_FIELD_FLAGS;
}

status = ucp_mem_map(perf->ucp.context, &mem_map_params, memh_p);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -180,7 +180,9 @@ class ucp_perf_test_runner {
send_started();
return UCS_OK;
case UCX_PERF_CMD_PUT:
*((uint8_t*)buffer + length - 1) = sn;
m_perf.allocator->memcpy((psn_t*)buffer + length - 1,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

{
if (ucs_likely(max_short->memtype_off > 0)) {
return max_short->memtype_off;
} else if (ucp_memory_type_cache_is_empty(ep->worker->context)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch,
fixed

@@ -195,6 +204,12 @@ ucp_rma_nonblocking_cb(ucp_ep_h ep, const void *buffer, size_t length,
return ucp_rma_send_request_cb(req, cb);
}

int ucp_rma_put_is_inline(size_t length, ucp_rkey_h rkey)
{
return (ucs_likely((rkey->cache.max_put_short > 0) &&
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

src/ucp/tag/tag_send.c Outdated Show resolved Hide resolved
@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 22 of 25 workers (click for details)

Note: the logs will be deleted after 22-Nov-2019

Agent/Stage Status
_main ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ❌ FAILURE
hpc-arm-cavium-jenkins_W1 ❌ FAILURE
hpc-arm-cavium-jenkins_W2 ❌ FAILURE
hpc-arm-hwi-jenkins_W0 ❌ FAILURE
hpc-arm-hwi-jenkins_W1 ❌ FAILURE
hpc-arm-hwi-jenkins_W2 ❌ FAILURE
hpc-test-node-gpu_W0 ❌ FAILURE
hpc-test-node-gpu_W1 ❌ FAILURE
hpc-test-node-gpu_W2 ❌ FAILURE
hpc-test-node-gpu_W3 ❌ FAILURE
hpc-test-node-legacy_W0 ❌ FAILURE
hpc-test-node-legacy_W1 ❌ FAILURE
hpc-test-node-legacy_W2 ❌ FAILURE
hpc-test-node-legacy_W3 ❌ FAILURE
hpc-test-node-new_W0 ❌ FAILURE
hpc-test-node-new_W1 ❌ FAILURE
hpc-test-node-new_W2 ❌ FAILURE
hpc-test-node-new_W3 ❌ FAILURE
r-vmb-ppc-jenkins_W1 ❌ FAILURE
r-vmb-ppc-jenkins_W2 ❌ FAILURE
r-vmb-ppc-jenkins_W3 ❌ FAILURE
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS

@dmitrygx dmitrygx added the WIP-DNM Work in progress / Do not review label Nov 18, 2019
@dmitrygx dmitrygx force-pushed the topic/ucp/rma_detect_mem_type branch from 392e09a to 985cfd4 Compare November 18, 2019 11:04
@dmitrygx dmitrygx removed the WIP-DNM Work in progress / Do not review label Nov 18, 2019
@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 1 of 1 workers (click for details)

Note: the logs will be deleted after 25-Nov-2019

Agent/Stage Status
_main ❌ FAILURE

@mellanox-github
Copy link
Contributor

Mellanox CI: UNKNOWN on 25 workers (click for details)

Note: the logs will be deleted after 25-Nov-2019

Agent/Stage Status
_main ❓ ABORTED
r-vmb-ppc-jenkins_W0 ❓ ABORTED
r-vmb-ppc-jenkins_W1 ❓ ABORTED
r-vmb-ppc-jenkins_W2 ❓ ABORTED
r-vmb-ppc-jenkins_W3 ❓ ABORTED
hpc-arm-cavium-jenkins_W0 ❓ UNKNOWN
hpc-arm-cavium-jenkins_W1 ❓ UNKNOWN
hpc-arm-cavium-jenkins_W2 ❓ UNKNOWN
hpc-arm-cavium-jenkins_W3 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W0 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W1 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W2 ❓ UNKNOWN
hpc-arm-hwi-jenkins_W3 ❓ UNKNOWN
hpc-test-node-gpu_W0 ❓ UNKNOWN
hpc-test-node-gpu_W1 ❓ UNKNOWN
hpc-test-node-gpu_W2 ❓ UNKNOWN
hpc-test-node-gpu_W3 ❓ UNKNOWN
hpc-test-node-legacy_W0 ❓ UNKNOWN
hpc-test-node-legacy_W1 ❓ UNKNOWN
hpc-test-node-legacy_W2 ❓ UNKNOWN
hpc-test-node-legacy_W3 ❓ UNKNOWN
hpc-test-node-new_W0 ❓ UNKNOWN
hpc-test-node-new_W1 ❓ UNKNOWN
hpc-test-node-new_W2 ❓ UNKNOWN
hpc-test-node-new_W3 ❓ UNKNOWN

@dmitrygx dmitrygx force-pushed the topic/ucp/rma_detect_mem_type branch from ecd72a6 to cacdf20 Compare November 18, 2019 14:26
@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 17 of 25 workers (click for details)

Note: the logs will be deleted after 25-Nov-2019

Agent/Stage Status
_main ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ❌ FAILURE
hpc-arm-cavium-jenkins_W1 ❌ FAILURE
hpc-arm-hwi-jenkins_W0 ❌ FAILURE
hpc-arm-hwi-jenkins_W1 ❌ FAILURE
hpc-test-node-gpu_W0 ❌ FAILURE
hpc-test-node-gpu_W1 ❌ FAILURE
hpc-test-node-gpu_W2 ❌ FAILURE
hpc-test-node-gpu_W3 ❌ FAILURE
hpc-test-node-legacy_W0 ❌ FAILURE
hpc-test-node-legacy_W2 ❌ FAILURE
hpc-test-node-legacy_W3 ❌ FAILURE
hpc-test-node-new_W0 ❌ FAILURE
hpc-test-node-new_W1 ❌ FAILURE
hpc-test-node-new_W3 ❌ FAILURE
r-vmb-ppc-jenkins_W1 ❌ FAILURE
r-vmb-ppc-jenkins_W2 ❌ FAILURE
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@dmitrygx
Copy link
Member Author

@brminich @Akshay-Venkatesh @bureddy could you review pls?

ssize_t length)
{
return (ucs_likely(length <= max_short->memtype_off) ||
(length <= max_short->memtype_on &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -19,6 +19,9 @@
#include <inttypes.h>


const ucp_memtype_thresh_t ucp_rma_sw_max_put_short = {-1, -1};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed this
done

{
return (ucs_likely(length <= max_short->memtype_off) ||
(length <= max_short->memtype_on &&
ucp_memory_type_cache_is_empty(ep->worker->context)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we said ucp_memory_type_cache_is_empty() is not really a good test because we are adding the UNKNOWN regions.. so better remote it instead of having more logic depend on it
need just one max_short. it will be a real number of no mem_type_mds, and -1 if there is at least 1 mem type md

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes
checking ucs_likely(length <= max_short->memtype_off) would be enough

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to add memtype on/off for rma
keep one max_short as it was before

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to add memtype on/off for rma

okay, I see now
btw, do we need the same for TAG-matching proto? we could remove memtype on/off for it as well and use max_short that will be -1 in case of there is at least one mem_type MD

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we talked about removing it for tag matching

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we talked about removing it for tag matching

let's remove it in this PR? wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better not: let's keep this PR focused on RMA memtype handling, and cleanup in another PR

req->send.mem_type = ucp_memory_type_detect(ep->worker->context,
buffer, length);
if (ucs_unlikely((req->send.mem_type != UCS_MEMORY_TYPE_HOST) ||
(rkey->mem_type != UCS_MEMORY_TYPE_HOST))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remote memory could be non host in case of GPU direct.. if we got the rkey for, it means it was registered on remote side so it's accessible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remote memory could be non host in case of GPU direct.. if we got the rkey for, it means it was registered on remote side so it's accessible

rkey->mem_type can be non-HOST (e.g. UCS_MEMORY_TYPE_CUDA) in case of UCX_TLS=tcp,cuda_copy
do you mean that we have to check md_map to understand whether it was registered on network TL (e.g. IB using GPUDirect) or not?
If we see that this memory (local or remote) was registered using GPUDirect, then we could support UCP RMA

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for tcp case, it's different, it's sw-rma protocol which indeed works only on host memory

Copy link
Member Author

@dmitrygx dmitrygx Nov 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for tcp case, it's different, it's sw-rma protocol which indeed works only on host memory

yes
but if we remove checking remote memory type, then we break doung an RMA operation for H->C case, when GPUDirect isn't supported

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so rma_basic protocol should not check remote memtype, only rma_sw should check remote mem type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so rma_basic protocol should not check remote memtype, only rma_sw should check remote mem type

@yosefe thank you for the clarification!
this is common code for both basic and sw RMA, will move a check for remote memory type to sw RMA only

std::vector<ucs_memory_type_t> mem_types = mem_buffer::supported_mem_types();
std::vector<std::pair<ucs_memory_type_t, ucs_memory_type_t> > pairs;

for (std::vector<ucs_memory_type_t>::const_iterator i =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function should not be in mem_buffer, have some other helper to construct cartesian multiplication

mem_buffer::supported_mem_type_pairs()
{
std::vector<ucs_memory_type_t> mem_types = mem_buffer::supported_mem_types();
std::vector<std::pair<ucs_memory_type_t, ucs_memory_type_t> > pairs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add typedef for std::pair<ucs_memory_type_t, ucs_memory_type_t>

class test_ucp_rma_basic : public test_ucp_memheap {
public:
void init() {
std::vector<std::pair<ucs_memory_type_t, ucs_memory_type_t> > mem_type_pairs =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

save this in some class-static (global) varaible


void check_rma_op_status(ucs_status_t status) const {
if ((m_local_mem_type != UCS_MEMORY_TYPE_HOST) ||
(m_remote_mem_type != UCS_MEMORY_TYPE_HOST)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not check remote mem type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not check remote memory type in case of GPUDirect only
otherwise (non-GPUDirect TL), a target side could register CUDA buffer and give it to an RMA initiator - but RMA initiator has to fail such transfer as it unsupported by UCP RMA, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if target does not support GPU-D it would not have registered the memory with IB

@dmitrygx dmitrygx force-pushed the topic/ucp/rma_detect_mem_type branch from 8b706c0 to cd91605 Compare November 26, 2019 06:38
void ucp_rkey_resolve_inner(ucp_rkey_h rkey, ucp_ep_h ep)
/* If we use sw rma/amo need to resolve destination endpoint in order to
* receive responses and completion messages */
static void ucp_rkey_resolve_desp_ep_ptr(ucp_ep_h ep, ucp_lane_index_t *lane)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve_desT_ep_ptr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/ucp/rma/rma_send.c Show resolved Hide resolved
src/ucp/rma/rma_send.c Show resolved Hide resolved
if (ucs_unlikely(status != UCS_OK)) {
return UCS_STATUS_PTR(status);
status_ptr = UCS_STATUS_PTR(status);
goto err;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can put and return here, goto looks excessive

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

UCP_REQUEST_FLAG_RELEASED);
if (ucs_unlikely(status != UCS_OK)) {
return status;
goto err;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can put and return here, goto looks excessive

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -19,6 +19,7 @@
*/
class mem_buffer {
public:
/* get all supported memory types */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method name is quite self-explaining :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 4 of 25 workers (click for details)

Note: the logs will be deleted after 16-Dec-2019

Agent/Stage Status
_main ❌ FAILURE
hpc-test-node-gpu_W0 ❌ FAILURE
hpc-test-node-legacy_W0 ❌ FAILURE
hpc-test-node-new_W0 ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@brminich
Copy link
Contributor

brminich commented Dec 9, 2019

failures are relevant:

07:40:06 "/scrap/jenkins/workspace/hpc-ucx-pr/label/hpc-test-node-new/worker/0/contrib/.
07:40:06           ./src/ucp/core/ucp_ep.c", line 1488: error #68: integer conversion
07:40:06           resulted in a change of sign
07:40:06                   rma_config->put.max_short = (context->num_mem_type_detect_mds ? -1 :
07:40:06                                                                                   ^
07:40:06 
07:40:06 "/scrap/jenkins/workspace/hpc-ucx-pr/label/hpc-test-node-new/worker/0/contrib/.
07:40:06           ./src/ucp/core/ucp_ep.c", line 1513: error #68: integer conversion
07:40:06           resulted in a change of sign
07:40:06                   rma_config->get.max_short = (context->num_mem_type_detect_mds ? -1 :

@dmitrygx dmitrygx force-pushed the topic/ucp/rma_detect_mem_type branch from eddf013 to be4a1c9 Compare December 10, 2019 08:42
@mellanox-github
Copy link
Contributor

Mellanox CI: PASSED on 25 workers (click for details)

Note: the logs will be deleted after 17-Dec-2019

Agent/Stage Status
_main ✔️ SUCCESS
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@dmitrygx dmitrygx changed the title UCP/RMA: Report the error if doing RMA for non-HOST memory UCP/RMA: Report the error if doing RMA/AMO for non-HOST memory Dec 10, 2019
@dmitrygx dmitrygx force-pushed the topic/ucp/rma_detect_mem_type branch from be4a1c9 to a5a05fd Compare December 16, 2019 05:48
@mellanox-github
Copy link
Contributor

Mellanox CI: PASSED on 25 workers (click for details)

Note: the logs will be deleted after 23-Dec-2019

Agent/Stage Status
_main ✔️ SUCCESS
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@mellanox-github
Copy link
Contributor

Mellanox CI: PASSED on 25 workers (click for details)

Note: the logs will be deleted after 23-Dec-2019

Agent/Stage Status
_main ✔️ SUCCESS
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@dmitrygx
Copy link
Member Author

this PR depends on #4588
@brminich @yosefe could you review pls? your comments were fixed

@dmitrygx
Copy link
Member Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dmitrygx dmitrygx force-pushed the topic/ucp/rma_detect_mem_type branch from 7b8e7fe to ee55de8 Compare December 24, 2019 15:44
@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 2 of 25 workers (click for details)

Note: the logs will be deleted after 31-Dec-2019

Agent/Stage Status
_main ❌ FAILURE
hpc-test-node-legacy_W1 ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@dmitrygx
Copy link
Member Author

bot:mlx:retest

@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 2 of 25 workers (click for details)

Note: the logs will be deleted after 31-Dec-2019

Agent/Stage Status
_main ❌ FAILURE
hpc-test-node-legacy_W1 ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@dmitrygx
Copy link
Member Author

test node #3 lab issue
bot:mlx:retest

@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 4 of 25 workers (click for details)

Note: the logs will be deleted after 02-Jan-2020

Agent/Stage Status
_main ❌ FAILURE
hpc-test-node-legacy_W1 ❌ FAILURE
hpc-test-node-legacy_W2 ❌ FAILURE
hpc-test-node-legacy_W3 ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@dmitrygx
Copy link
Member Author

dmitrygx commented Jan 3, 2020

depends on #4643, #4642, #4641 PRs

@dmitrygx dmitrygx added the WIP-DNM Work in progress / Do not review label Jan 3, 2020
@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 2 of 25 workers (click for details)

Note: the logs will be deleted after 19-Jan-2020

Agent/Stage Status
_main ❌ FAILURE
hpc-test-node-new_W0 ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP-DNM Work in progress / Do not review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants