Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: implementation of ucp_request_alloc #5991

Closed
angainor opened this issue Dec 7, 2020 · 8 comments · Fixed by #5998
Closed

Q: implementation of ucp_request_alloc #5991

angainor opened this issue Dec 7, 2020 · 8 comments · Fixed by #5998
Labels

Comments

@angainor
Copy link

angainor commented Dec 7, 2020

Currently ucp_request_alloc is defined to always return NULL. I would like to use it to pass my requests to the new *_nbx API. What is the preferred way to implement this on the user-side?

@angainor angainor added the Bug label Dec 7, 2020
@angainor
Copy link
Author

angainor commented Dec 7, 2020

Or in other words, is it possible for me now to use the worker mempool for request management? It would be quite convenient.

@yosefe
Copy link
Contributor

yosefe commented Dec 7, 2020

need to pass your request by ucp_request_param_t::request field and set UCP_OP_ATTR_FIELD_REQUEST in ucp_request_param_t::op_attr_mask

@yosefe
Copy link
Contributor

yosefe commented Dec 7, 2020

in order to use worker mempool, need to leave UCP_OP_ATTR_FIELD_REQUEST unset, then UCP send operation will allocate and return a request if needed. UCP_OP_ATTR_FLAG_NO_IMM_CMPL flag forces to always allocate request, even if completed in-place.

@angainor
Copy link
Author

angainor commented Dec 7, 2020

Thanks! When I pass my own allocated request (calloc), is it allowed to free it in the recv callback? Currently I get segfaults when I do that. Previously for UCP-allocated requests I called ucp_request_free in the callback and that worked fine.

@angainor
Copy link
Author

angainor commented Dec 7, 2020

@yosefe sorry, disregard my previous post about canceling the request. That was my bug.

@yosefe
Copy link
Contributor

yosefe commented Dec 7, 2020

Thanks! When I pass my own allocated request (calloc), is it allowed to free it in the recv callback? Currently I get segfaults when I do that. Previously for UCP-allocated requests I called ucp_request_free in the callback and that worked fine.

Yes, it should be possible, and if not working it's a bug. Can you pls post backtrace of the segfault?

@angainor
Copy link
Author

angainor commented Dec 8, 2020

@yosefe I cannot be entirely sure this is not somehow my fault, but so far I was unable to find any bug.

[angainor-HP-EliteBook-850-G5:191432:0:191432] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x181)

This is the stack from the core dump:

#0  0x00007f7c1550c331 in ucs_mpool_add_to_freelist (mp=0x181, mp=0x181, add_to_tail=0, elem=0x5632e61fbaa8)
    at /home/angainor/prace_6ip/ucx-1.9.0/src/ucs/datastruct/mpool.inl:79
#1  ucs_mpool_put_inline (obj=0x5632e61fbab0)
    at /home/angainor/prace_6ip/ucx-1.9.0/src/ucs/datastruct/mpool.inl:79
#2  ucp_request_put (req=0x5632e61fbab0) at /home/angainor/prace_6ip/ucx-1.9.0/src/ucp/core/ucp_request.inl:137
#3  ucp_request_complete_tag_recv (status=UCS_OK, req=0x5632e61fbab0)
    at /home/angainor/prace_6ip/ucx-1.9.0/src/ucp/core/ucp_request.inl:159
#4  ucp_eager_tagged_handler (priv_length=0, hdr_len=8, flags=6, am_flags=<optimized out>, length=9, data=<optimized out>, 
    arg=<optimized out>) at tag/eager_rcv.c:106
#5  ucp_eager_only_handler (arg=<optimized out>, data=<optimized out>, length=9, am_flags=0) at tag/eager_rcv.c:137
#6  0x00007f7c14e26cc2 in uct_iface_invoke_am (flags=0, length=<optimized out>, data=0x7f7c0cd1f25c, id=<optimized out>, 
    iface=0x5632e61df080) at /home/angainor/prace_6ip/ucx-1.9.0/src/uct/base/uct_iface.h:635
#7  uct_mm_iface_invoke_am (flags=0, length=<optimized out>, data=0x7f7c0cd1f25c, am_id=<optimized out>, iface=0x5632e61df080)
    at sm/mm/base/mm_iface.h:238
#8  uct_mm_iface_process_recv (elem=0x7f7c0cd1f240, iface=0x5632e61df080) at sm/mm/base/mm_iface.c:232
#9  uct_mm_iface_poll_fifo (iface=0x5632e61df080) at sm/mm/base/mm_iface.c:280
#10 uct_mm_iface_progress (tl_iface=0x5632e61df080) at sm/mm/base/mm_iface.c:333
#11 0x00007f7c154febba in ucs_callbackq_dispatch (cbq=<optimized out>)
    at /home/angainor/prace_6ip/ucx-1.9.0/src/ucs/datastruct/callbackq.h:211
#12 uct_worker_progress (worker=<optimized out>) at /home/angainor/prace_6ip/ucx-1.9.0/src/uct/api/uct.h:2346
#13 ucp_worker_progress (worker=0x7f7c05748010) at core/ucp_worker.c:2040
#14 0x00005632e439836c in main._omp_fn ()
#15 0x00007f7c153a48e6 in GOMP_parallel () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#16 0x00005632e4396381 in main ()

In the code I allocate a custom request and post an nbx recv as follows:

void *req = calloc(ucp_req_size + request_data_size::value, 1);
req = req + ucp_req_size;

ucp_request_param_t param;
param.op_attr_mask = UCP_OP_ATTR_FIELD_CALLBACK | UCP_OP_ATTR_FIELD_REQUEST;
param.cb.recv = (ucp_tag_recv_nbx_callback_t)&communicator::recv_nbx_callback;
param.request = req;

auto ret = ucp_tag_recv_nbx(
                            m_ucp_rw,                                        // worker
                            msg.data(),                                      // buffer
                            msg.size(),                                      // buffer size
                            rtag,                                            // tag
                            ~std::uint_fast64_t(0ul),                        // tag mask
                            &param);                                         // callback function pointer

and this a simplified callback

static void recv_nbx_callback(void *__restrict ucx_req, ucs_status_t __restrict status, const ucp_tag_recv_info *tag_info, void *__restrict user_data)
{
    if (status == UCS_OK)
        {
            [...]
            free(ucx_req-ucp_req_size);
        }
    else [...]
}

@angainor
Copy link
Author

angainor commented Dec 8, 2020

It seems like UCP is trying to put my allocated request into it's pool?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants