Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validate rma lane if rkey is not needed for mem type #2995

Merged
merged 6 commits into from
Nov 2, 2018

Conversation

bureddy
Copy link
Contributor

@bureddy bureddy commented Oct 26, 2018

What

Fix rma lane selection for CUDA mem type when there is lane with is not required RKEY(ex CMA)

Why ?

When KNEM is not present, CMA will be selected for RMA for which RKEY is not needed. When RKEY is not needed, it is hard determinically indicate if it can do RMA to specific mem type. With current mem type protocols it is not possible to support all combinations for src and dst mem type combinations( {(host,host) (host,cuda), (cuda,host), (cuda, cuda)) in this scenario.

Fixes #2919

How ?

pack mem_type info in RKEY and validate rma lane mem type with remote and local memory type

@swx-jenkins1
Copy link

Test FAILed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5512/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8104/ for details (Mellanox internal link).

@@ -40,6 +42,8 @@ void ucp_rkey_packed_copy(ucp_context_h context, ucp_md_map_t md_map,
*(ucp_md_map_t*)p = md_map;
p += sizeof(ucp_md_map_t);

*((uint8_t *)p++) = mem_type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls use uct_memory_type_t instead of uint8_t for casting and sizeof where relevant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evgeny-leksikov uct_memory_type_t is defined as enum. wanted to save on rkey packed size with typecasting to uint8_t.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then pls check if we have static assert for UCT_MD_MEM_TYPE_LAST <= 255 or similar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added assert while packing rkey

@swx-jenkins1
Copy link

Test FAILed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5527/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8134/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8141/ for details (Mellanox internal link).

@bureddy
Copy link
Contributor Author

bureddy commented Oct 30, 2018

bot:mlx:retest

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8142/ for details (Mellanox internal link).

@swx-jenkins1
Copy link

Test FAILed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5535/ for details.

@@ -13,7 +13,10 @@
#include <inttypes.h>


static ucp_md_map_t ucp_mem_dummy_buffer = 0;
static struct {
ucp_md_map_t md_map;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent on column

static struct {
ucp_md_map_t md_map;
uint8_t mem_type;
} UCS_S_PACKED ucp_mem_dummy_buffer = {0, 0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use UCT_MEM_TYPE_xx constant instead of 0?

if ((md_index != UCP_NULL_RESOURCE) &&
(!(context->tl_mds[md_index].attr.cap.flags & UCT_MD_FLAG_NEED_RKEY)))
(!(md_attr->cap.flags & UCT_MD_FLAG_NEED_RKEY)) &&
(!rkey || (md_attr->cap.mem_type == mem_type &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( ) around internal conditions
also - maybe use helper bool variable? it's hard to parse this..

@yosefe yosefe added the Bugfix label Oct 31, 2018
@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8158/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8160/ for details (Mellanox internal link).

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5551/ for details.

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5553/ for details.

{
/* Lane does not need rkey, can use the lane with invalid rkey */
*uct_rkey_p = UCT_INVALID_RKEY;
return lane;
if (!rkey || ((md_attr->cap.mem_type == mem_type) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need to check rkey==NULL? we already assume in this function that rkey!=NULL, in line 393

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, RKEY is not needed for pipeline stating protocol on local mem type(cuda) endpoint rma bw lane (only cuda_copy) for it's zcopy ops to move data from GPU to CPU memory. So, instead creating dummy key pack/unpack I had passed passed NULL here (https://github.com/openucx/ucx/pull/2995/files#diff-f89c8c0721aa5b2b55347e1612b8395bR880).

*uct_rkey_p = UCT_INVALID_RKEY;
return lane;
if (!rkey || ((md_attr->cap.mem_type == mem_type) &&
(md_attr->cap.mem_type == rkey->mem_type))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: can compare rkey->mem_type to mem_type to save pointer dereference

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5559/ for details.

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8174/ for details (Mellanox internal link).

@yosefe yosefe merged commit ba1be52 into openucx:master Nov 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants