Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

udrc/test_ucp_tag_xfer.send_generic_recv_contig_exp_rndv/2 fails on PPC #2027

Closed
evgeny-leksikov opened this issue Nov 30, 2017 · 1 comment

Comments

@evgeny-leksikov
Copy link
Contributor

http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/label=clx-ppc-04.mtl.labs.mlnx,worker=0/5311/consoleFull#98030106524dff065-0424-4e55-b698-eb134734d522

18:50:12 [ RUN      ] udrc/test_ucp_tag_xfer.send_generic_recv_contig_exp_rndv/2
18:50:13 [1512060612.977630] [clx-ppc-04:91106:0]         rcache.c:300  UCX  WARN  mlx5_0: destroying inuse region 0x10038557d20 [0x10038980000..0x10038a98680] g- rw ref 1 lkey 0x1a6567 rkey 0x1a6567 atomic: lkey 0xffffffff rkey 0xffff
18:50:13 [clx-ppc-04:91106:0]      rcache.c:153  Assertion `region->refcount == 0' failed
18:50:13 
18:50:13 /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/sys/rcache.c: [ ucs_mem_region_destroy_internal() ]
18:50:13       ...
18:50:13       150     ucs_rcache_region_trace(rcache, region, "destroy");
18:50:13       151 
18:50:13       152     ucs_assert(region->refcount == 0);
18:50:13 ==>   153     ucs_assert(!(region->flags & UCS_RCACHE_REGION_FLAG_PGTABLE));
18:50:13       154 
18:50:13       155     if (region->flags & UCS_RCACHE_REGION_FLAG_REGISTERED) {
18:50:13       156         UCS_PROFILE_CODE("mem_dereg") {
18:50:13 
18:50:13 ==== backtrace ====
18:50:13  0 0x000000000005863c ucs_mem_region_destroy_internal()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/sys/rcache.c:153
18:50:13  1 0x000000000005863c ucs_rcache_purge()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/sys/rcache.c:302
18:50:13  2 0x000000000005863c ucs_rcache_t_cleanup()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/sys/rcache.c:629
18:50:13  3 0x000000000005ecf8 ucs_class_call_cleanup_chain()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/type/class.c:50
18:50:13  4 0x000000000005a888 ucs_rcache_destroy()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucs/sys/rcache.c:642
18:50:13  5 0x000000000002ec20 uct_ib_md_close()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/uct/ib/base/ib_md.c:1286
18:50:13  6 0x000000000001ee20 uct_md_close()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/uct/base/uct_md.c:125
18:50:13  7 0x0000000000017100 ucp_free_resources()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucp/core/ucp_context.c:483
18:50:13  8 0x0000000000017100 ucp_cleanup()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../src/ucp/core/ucp_context.c:950
18:50:13  9 0x000000001031387c ucs::handle<ucp_context*, void*>::release()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/test_helpers.h:380
18:50:13 10 0x000000001031387c ucs::handle<ucp_context*, void*>::reset()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/test_helpers.h:319
18:50:13 11 0x000000001031387c ~handle()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/test_helpers.h:314
18:50:13 12 0x000000001031387c ucp_test_base::entity::~entity()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/ucp/ucp_test.cc:335
18:50:13 13 0x0000000010313db4 ucs::ptr_vector_base<ucp_test_base::entity>::release()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/test_helpers.h:247
18:50:13 14 0x0000000010313db4 ucs::ptr_vector_base<ucp_test_base::entity>::clear()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/test_helpers.h:218
18:50:13 15 0x0000000010313db4 ucp_test::cleanup()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/ucp/ucp_test.cc:59
18:50:13 16 0x000000001009f968 ucs::test_base::TearDownProxy()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/test.cc:226
18:50:13 17 0x00000000101e57c8 ucp_test::TearDown()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/ucp/ucp_test.h:104
18:50:13 18 0x0000000010093574 HandleSehExceptionsInMethodIfSupported<testing::Test, void>()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3562
18:50:13 19 0x0000000010093574 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3598
18:50:13 20 0x0000000010082f84 testing::Test::Run()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3643
18:50:13 21 0x000000001008312c testing::TestInfo::Run()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3812
18:50:13 22 0x0000000010083378 testing::TestCase::Run()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3930
18:50:13 23 0x0000000010089198 testing::internal::UnitTestImpl::RunAllTests()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:5802
18:50:13 24 0x00000000100895cc testing::internal::UnitTestImpl::RunAllTests()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:5719
18:50:13 25 0x00000000100895cc HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3562
18:50:13 26 0x00000000100895cc HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:3598
18:50:13 27 0x00000000100895cc testing::UnitTest::Run()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest-all.cc:5416
18:50:13 28 0x000000001002143c RUN_ALL_TESTS()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/gtest.h:20059
18:50:13 29 0x000000001002143c main()  /scrap/jenkins/workspace/hpc-ucx-pr-2/label/clx-ppc-04.mtl.labs.mlnx/worker/0/contrib/../test/gtest/common/main.cc:77
18:50:13 30 0x0000000000024580 generic_start_main.isra.0()  libc-start.c:0
18:50:13 31 0x0000000000024774 __libc_start_main()  ???:0
18:50:13 ===================
18:50:13 Sending notification to evgenylek@mellanox.com
18:50:18 [clx-ppc-04:91106:0] Process frozen...
@yosefe
Copy link
Contributor

yosefe commented Dec 6, 2017

duplicate of #1977

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants