-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/CORE: Drop packets with invalid REQ or UCP EP IDs #6001
Conversation
f025184
to
570373b
Compare
b1e6aaa
to
d284266
Compare
failure was #6029 |
a304d9f
to
ec740b3
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
botLpipe:retest |
1 similar comment
botLpipe:retest |
bot:pipe:retest |
@brminich @yosefe I'd keep the behavior as is (except dropping packets if EP doesn't exist anymore in PTR MAP), since there are many tests (e.g. |
failure was #6011 |
bot:pipe:retest |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
bot:pipe:retest |
@brminich failure is not related, could you review pls? |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
test/gtest/ucp/test_ucp_sockaddr.cc
Outdated
rreq = ucp_tag_recv_nb(receiver().worker(), &recv_buf[0], size, | ||
ucp_dt_make_contig(1), 0, 0, | ||
rtag_complete_cb); | ||
ucp_tag_recv_info_t recv_info = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put this is separate function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
test/gtest/ucp/test_ucp_sockaddr.cc
Outdated
if (!err_handling) { | ||
request_wait(sreq); | ||
request_wait(rreq); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make std::vector of the requests we need to wait for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
test/gtest/ucp/test_ucp_sockaddr.cc
Outdated
if (!err_handling) { | ||
compare_buffers(send_buf, recv_buf); | ||
} else { | ||
delete slh; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ucs::auto_ptr<scoped_log_handler>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
test/gtest/ucp/test_ucp_sockaddr.cc
Outdated
(get_variant_value() & | ||
SEND_STOP) ? | ||
send_stop : NULL, | ||
(get_variant_value() & | ||
RECV_STOP) ? | ||
recv_stop : NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we check the variant inside test_tag_send_recv?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't since this is the function from the base class
but we need to check variants and apply them only warmup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any other way to simplify this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed as you suggested, but it was required to move functions to test_ucp_sockaddr_protocols
class
test/gtest/ucp/test_ucp_sockaddr.cc
Outdated
(err_str.find("ptr map ") != std::string::npos) || | ||
(err_str.find("was not returned to mpool ucp_requests") != |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we OK to have leaks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it will be fixed then by the tracking UCP requests
it is ok, since we close EP for send, we don't expect that recv request won't be completed and vice versa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we force-release the leaked request somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately, it could be problematic, since we need to fix C++ compilation of ucp_request.inl
file to use ucp_request_put()
function for force-release of UCP requests and more complex thing is to release PTR map key since it is needed in RNDV, synchronized protocols
tracking UCP requests will allow us just remove the warn_leak_hander()
and we would expect that no warnings/errors are printed by UCP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we force-release the leaked request somehow?
@yosefe done by forcibly setting COMPLETED flag and releasing a request ID if a request is in PTR MAP.
@yosefe could you review pls? |
64c4221
to
af2d0b1
Compare
@yosefe hope it is what you mean |
bot:pipe:retest |
@yosefe could you review pls? |
the error is unrelated and will be fixed by @gleon99 in #6112 |
@yosefe could you review pls? failures are unrelated |
f4c4d00
to
991ace1
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
991ace1
to
a5f9ea7
Compare
@yosefe ok to merge? |
What
Drop packets with invalid REQ or UCP EP IDs
Why ?
Just drop packets instead of asserting that we expect that requests or EPs are found by their ids.
How ?
ucp_worker_get_ep_by_id()
anducp_worker_get_request_by_id()
. If it unsuccessful to find a request or an EP, just drop a packet.