We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/5397/label=hpc-test-node,worker=2/consoleFull#98030106524dff065-0424-4e55-b698-eb134734d522
22:10:53 [ RUN ] ud/uct_flush_test.am_zcopy_flush_ep_nb/1 22:10:53 [hpc-test-node:31879:0] ud_verbs.c:305 Fatal: Send completion (wr_id=0xFAAFFAAF with error: local protection error 22:10:54 22:10:54 /scrap/jenkins/workspace/hpc-ucx-pr-3/label/hpc-test-node/worker/2/contrib/../src/uct/ib/ud/verbs/ud_verbs.c: [ uct_ud_verbs_iface_poll_tx() ] 22:10:54 ... 22:10:54 301 22:10:54 302 if (ucs_unlikely(wc.status != IBV_WC_SUCCESS)) { 22:10:54 303 ucs_fatal("Send completion (wr_id=0x%0X with error: %s ", 22:10:54 ==> 304 (unsigned)wc.wr_id, ibv_wc_status_str(wc.status)); 22:10:54 305 return 0; 22:10:54 306 } 22:10:54 307 22:10:54 22:10:54 ==== backtrace ==== 22:10:54 0 0x0000000000074d1a uct_ud_verbs_iface_poll_tx() /scrap/jenkins/workspace/hpc-ucx-pr-3/label/hpc-test-node/worker/2/contrib/../src/uct/ib/ud/verbs/ud_verbs.c:304 22:10:54 1 0x00000000005273da uct_test::progress() /scrap/jenkins/workspace/hpc-ucx-pr-3/label/hpc-test-node/worker/2/contrib/../src/ucs/datastruct/callbackq.h:168 22:10:54 2 0x00000000004b2b68 uct_flush_test::flush_ep_nb() /scrap/jenkins/workspace/hpc-ucx-pr-3/label/hpc-test-node/worker/2/contrib/../test/gtest/uct/test_flush.cc:273 22:10:54 3 0x00000000004b75a9 uct_flush_test::test_flush_am_zcopy() /scrap/jenkins/workspace/hpc-ucx-pr-3/label/hpc-test-node/worker/2/contrib/../test/gtest/uct/test_flush.cc:193 22:10:54 4 0x00000000004b19e3 uct_flush_test_am_zcopy_flush_ep_nb_Test::test_body() /scrap/jenkins/workspace/hpc-ucx-pr-3/label/hpc-test-node/worker/2/contrib/../test/gtest/uct/test_flush.cc:505 22:10:54 5 0x000000000046fd26 ucs::test_base::run() /scrap/jenkins/workspace/hpc-ucx-pr-3/label/hpc-test-node/worker/2/contrib/../test/gtest/common/test.cc:249 22:10:54 6 0x0000000000467343 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>() /scrap/jenkins/workspace/hpc-ucx-pr-3/label/hpc-test-node/worker/2/contrib/../test/gtest/common/gtest-all.cc:3562 22:10:54 7 0x000000000045b7bd testing::Test::Run() /scrap/jenkins/workspace/hpc-ucx-pr-3/label/hpc-test-node/worker/2/contrib/../test/gtest/common/gtest-all.cc:3635 22:10:54 8 0x000000000045b88c testing::TestInfo::Run() /scrap/jenkins/workspace/hpc-ucx-pr-3/label/hpc-test-node/worker/2/contrib/../test/gtest/common/gtest-all.cc:3812 22:10:54 9 0x000000000045b9ef testing::TestCase::Run() /scrap/jenkins/workspace/hpc-ucx-pr-3/label/hpc-test-node/worker/2/contrib/../test/gtest/common/gtest-all.cc:3930 22:10:54 10 0x0000000000460387 testing::internal::UnitTestImpl::RunAllTests() /scrap/jenkins/workspace/hpc-ucx-pr-3/label/hpc-test-node/worker/2/contrib/../test/gtest/common/gtest-all.cc:5802 22:10:54 11 0x000000000046068b testing::internal::UnitTestImpl::RunAllTests() /scrap/jenkins/workspace/hpc-ucx-pr-3/label/hpc-test-node/worker/2/contrib/../test/gtest/common/gtest-all.cc:5719 22:10:54 12 0x000000000040f193 main() /scrap/jenkins/workspace/hpc-ucx-pr-3/label/hpc-test-node/worker/2/contrib/../test/gtest/common/gtest.h:20059 22:10:54 13 0x0000000000021c05 __libc_start_main() ???:0 22:10:54 14 0x0000000000445f48 _start() ???:0 22:10:54 =================== 22:10:54 Sending notification to mikhailb@mellanox.com
The text was updated successfully, but these errors were encountered:
happens because UD force-close does not clean up the QP, and doesnt handle send completion with error
Sorry, something went wrong.
yosefe
No branches or pull requests
http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/5397/label=hpc-test-node,worker=2/consoleFull#98030106524dff065-0424-4e55-b698-eb134734d522
The text was updated successfully, but these errors were encountered: