You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
happens because ucp_worker_flush is blocking(), so receiver does not make progress and does not let the sender to switch from stub_ep to real ep.
(gdb) bt
#0 0x00007f5971b64f16 in ?? () from /usr/lib64/libmlx4-rdmav2.so
#1 0x00007f597244f256 in ibv_poll_cq (arg=0x3b8ab90) at /usr/include/infiniband/verbs.h:1271
#2 uct_ib_poll_cq (arg=0x3b8ab90) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../src/uct/ib/base/ib_device.h:267
#3 uct_rc_verbs_iface_poll_rx_common (arg=0x3b8ab90) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../src/uct/ib/rc/verbs/rc_verbs_common.h:154
#4 uct_rc_verbs_iface_progress (arg=0x3b8ab90) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../src/uct/ib/rc/verbs/rc_verbs_iface.c:129
#5 0x00007f597244030a in ucs_callbackq_dispatch (worker=<value optimized out>) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../src/ucs/datastruct/callbackq.h:150
#6 uct_worker_progress (worker=<value optimized out>) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../src/uct/base/uct_worker.c:37
#7 0x00007f5971fdf283 in ucp_worker_progress (worker=0x3b9aa60) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../src/ucp/core/ucp_worker.c:850
#8 0x00007f5971fe22a0 in ucp_worker_flush_inner (worker=0x3b9aa60) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../src/ucp/rma/basic_rma.c:430
#9 ucp_worker_flush (worker=0x3b9aa60) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../src/ucp/rma/basic_rma.c:423
#10 0x000000000057a3b1 in ucp_test_base::entity::flush_worker (this=0x39c0330, worker_index=0) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../test/gtest/ucp/ucp_test.cc:362
#11 0x00000000004f5c70 in test_ucp_memheap::test_nonblocking_implicit_stream_xfer (this=0x3b62650, send=
(void (test_ucp_memheap::*)(test_ucp_memheap *, ucp_test_base::entity *, size_t, void *, ucp_rkey_h, std::string &)) 0x508980 <test_ucp_rma::nonblocking_get_nbi(ucp_test_base::entity*, size_t, void*, ucp_rkey_h, std::string&)>,
size=4730, max_iter=300, alignment=1, malloc_allocate=false, is_ep_flush=false) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../test/gtest/ucp/test_ucp_memheap.cc:118
#12 0x00000000005012e0 in test_ucp_rma_nonblocking_stream_get_nbi_flush_worker_Test::test_body (this=0x3b62650) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../test/gtest/ucp/test_ucp_rma.cc:244
#13 0x0000000000436f9e in ucs::test_base::run (this=0x3b62650) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../test/gtest/common/test.cc:204
#14 0x0000000000437a0d in ucs::test_base::TestBodyProxy (this=0x3b62650) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../test/gtest/common/test.cc:230
#15 0x000000000043088d in HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0x3b626a8, method=&virtual testing::Test::TestBody(), location=0x64515a "the test body")
at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../test/gtest/common/gtest-all.cc:3562
#16 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x3b626a8, method=&virtual testing::Test::TestBody(), location=0x64515a "the test body")
at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../test/gtest/common/gtest-all.cc:3598
#17 0x0000000000428157 in testing::Test::Run (this=0x3b626a8) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../test/gtest/common/gtest-all.cc:3635
#18 0x000000000042822e in testing::TestInfo::Run (this=0x3ac3ae0) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../test/gtest/common/gtest-all.cc:3812
#19 0x0000000000428377 in testing::TestCase::Run (this=0x3aa35a0) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../test/gtest/common/gtest-all.cc:3930
#20 0x000000000042860c in testing::internal::UnitTestImpl::RunAllTests (this=0x3946aa0) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../test/gtest/common/gtest-all.cc:5802
#21 0x000000000043041d in HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x3946aa0, method=
(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl *)) 0x4283f0 <testing::internal::UnitTestImpl::RunAllTests()>, location=0x646110 "auxiliary test code (environments or event listeners)")
at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../test/gtest/common/gtest-all.cc:3562
#22 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x3946aa0, method=
(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl *)) 0x4283f0 <testing::internal::UnitTestImpl::RunAllTests()>, location=0x646110 "auxiliary test code (environments or event listeners)")
at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../test/gtest/common/gtest-all.cc:3598
#23 0x0000000000427859 in testing::UnitTest::Run (this=0x999860) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../test/gtest/common/gtest-all.cc:5416
#24 0x0000000000431b7f in RUN_ALL_TESTS (argc=1, argv=<value optimized out>) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../test/gtest/common/gtest.h:20059
#25 main (argc=1, argv=<value optimized out>) at /hpc/mtr_scrap/users/yosefe/ucx/contrib/../test/gtest/common/main.cc:79
The text was updated successfully, but these errors were encountered:
20:12:52 [ RUN ] udrcx/test_ucp_rma.blocking_small/0
20:12:53 [ OK ] udrcx/test_ucp_rma.blocking_small/0 (314 ms)
20:12:53 [ RUN ] udrcx/test_ucp_rma.nonblocking_stream_get_nbi_flush_ep/0
22:19:16 Build timed out (after 150 minutes). Marking the build as failed.
22:19:16 Build was aborted
22:19:16 TAP Reports Processing: START
yosefe
changed the title
hang in udrc/test_ucp_rma.nonblocking_stream_get_nbi_flush_worker/1
[jenkins] hang in udrc/test_ucp_rma.nonblocking_stream_get_nbi_flush_worker/1
Oct 4, 2017
yosefe
changed the title
[jenkins] hang in udrc/test_ucp_rma.nonblocking_stream_get_nbi_flush_worker/1
hang in udrc/test_ucp_rma.nonblocking_stream_get_nbi_flush_worker/1
Oct 4, 2017
happens because ucp_worker_flush is blocking(), so receiver does not make progress and does not let the sender to switch from stub_ep to real ep.
The text was updated successfully, but these errors were encountered: