Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure in shm_ib/test_ucp_perf.envelope/0 #3052

Closed
brminich opened this issue Nov 22, 2018 · 2 comments
Closed

failure in shm_ib/test_ucp_perf.envelope/0 #3052

brminich opened this issue Nov 22, 2018 · 2 comments
Assignees
Milestone

Comments

@brminich
Copy link
Contributor

Happened with valgrind here: http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8264/label=hpc-test-node-upstream,worker=0/consoleFull#-853622487b816df86-9afc-46eb-984f-1a9a125d73bb

22:06:31 [ RUN      ] shm_ib/test_ucp_perf.envelope/0
22:06:31 [     INFO ]                tag latency : 0.505 usec (performance not checked)
22:06:32 [     INFO ]            tag iov latency : 1.152 usec (performance not checked)
22:06:32 [     INFO ]                     tag mr : 7.481 Mpps (performance not checked)
22:06:33 [     INFO ]                tag sync mr : 1.906 Mpps (performance not checked)
22:06:34 [     INFO ]                tag wild mr : 6.694 Mpps (performance not checked)
22:06:34 [     INFO ]                     tag bw : 3844.994 MB/sec (performance not checked)
22:06:34 [     INFO ]         tag bw_zcopy_multi : 2567.160 MB/sec (performance not checked)
22:06:34 [     INFO ]                put latency : 0.301 usec (performance not checked)
22:06:35 [     INFO ]                   put rate : 8.512 Mpps (performance not checked)
22:06:35 [     INFO ]                     put bw : 8877.615 MB/sec (performance not checked)
22:06:35 [     INFO ]                get latency : 0.200 usec (performance not checked)
22:06:35 [     INFO ]                     get bw : 7912.120 MB/sec (performance not checked)
22:06:35 [     INFO ]             stream latency : 0.531 usec (performance not checked)
22:06:35 [     INFO ]                  stream bw : 3359.345 MB/sec (performance not checked)
22:06:36 [     INFO ]   stream recv-data latency : 0.547 usec (performance not checked)
22:06:36 [     INFO ]        stream recv-data bw : 9400.656 MB/sec (performance not checked)
22:06:36 [hpc-test-node-upstream:33669:1:56814]      ucp_ep.c:641  UCX Bug: pending request 0x7f01dc6611e8 on ep 0x7f0205b63000 should have been flushed
22:06:36 [hpc-test-node-upstream:33669:0:56813]      ucp_ep.c:641  UCX Bug: pending request 0x7f01ec6463e8 on ep 0x7f0205b23000 should have been flushed
22:06:37 make: *** [test] Segmentation fault (core dumped)
22:06:37 make: Leaving directory `/images/jenkins/workspace/hpc-ucx-pr-2/label/hpc-test-node-upstream/worker/0/build-test/test/gtest'
22:06:37 

Can be related to #2089 or #2354

@yosefe
Copy link
Contributor

yosefe commented Nov 22, 2018

@brminich seems it's without valgrind:

22:06:37 make: *** [test] Segmentation fault (core dumped)

in case of valgrind it would have been [test-valgrind]

@brminich
Copy link
Contributor Author

ok, did not notice that.

@yosefe yosefe added this to the v1.5.0 milestone Nov 22, 2018
yosefe added a commit to yosefe/ucx that referenced this issue Nov 22, 2018
We must not update cached tail in uct_mm_ep_flush(), because we don't
execute the pending queue. As a result, we may get new send resources
but not use them, so flush could return UCS_OK while there are still
pending requests.

Fixes openucx#3052
yosefe added a commit to yosefe/ucx that referenced this issue Nov 24, 2018
We must not update cached tail in uct_mm_ep_flush() if there are any
pending elements. We may get new send resources but not use them, so
flush could return UCS_OK while there pending requests.

Fixes openucx#3052
yosefe added a commit to yosefe/ucx that referenced this issue Nov 25, 2018
We must not update cached tail in uct_mm_ep_flush() if there are any
pending elements. We may get new send resources but not use them, so
flush could return UCS_OK while there pending requests.

Fixes openucx#3052
yosefe added a commit to yosefe/ucx that referenced this issue Nov 25, 2018
We must not update cached tail in uct_mm_ep_flush() if there are any
pending elements. We may get new send resources but not use them, so
flush could return UCS_OK while there pending requests.

Fixes openucx#3052
@yosefe yosefe closed this as completed in 1ac5745 Nov 26, 2018
yosefe added a commit to yosefe/ucx that referenced this issue Nov 27, 2018
We must not update cached tail in uct_mm_ep_flush() if there are any
pending elements. We may get new send resources but not use them, so
flush could return UCS_OK while there pending requests.

Fixes openucx#3052
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants