We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After failure of RMA operation, the following is happening (this is a following of another issue #1770 ):
17:45:00 [ RUN ] dcx/test_ucp_rma.nonblocking_stream_get_nbi_flush_ep/1 17:45:01 [1503585901.060628] [hpc-test-node2:40481:3] ib_device.c:171 UCX ERROR IB Async event on mlx5_0: DCT access error on DCTN 0x9374 17:45:01 [1503585901.143067] [hpc-test-node2:40481:4] ib_mlx5_log.c:109 UCX ERROR Error on QP 0x9368 wqe[1]: Invalid request (synd 0x12 vend 0x8a) opcode RDMA_READ 17:45:01 [1503585901.143149] [hpc-test-node2:40481:4] ucp_worker.c:399 UCX ERROR Error Endpoint timeout was not handled for ep 0x35e5310 17:45:01 /scrap/jenkins/scrap/workspace/hpc-ucx-pr-4/label/hpc-test-node2/worker/2/contrib/../test/gtest/ucp/ucp_test.cc:378: Failure 17:45:01 Error: Endpoint timeout 17:45:01 [1503585901.272351] [hpc-test-node2:40481:4] mpool.c:38 UCX WARN object 0x313ef40 was not returned to mpool ucp_requests ... 17:45:01 [1503585901.413624] [hpc-test-node2:40481:4] rcache.c:284 UCX WARN mlx5_0: destroying inuse region 0x35e52b0 [0x7ffff01bd000..0x7ffff045d000] gt- rw ref 1 lkey 0x10451 rkey 0x10451 atomic: lkey 0x1b05 rkey 0x1b05 17:45:01 /scrap/jenkins/scrap/workspace/hpc-ucx-pr-4/label/hpc-test-node2/worker/2/contrib/../test/gtest/common/test.cc:228: Failure 17:45:01 Failed 17:45:01 Got 49 warnings during the test 17:45:01
The text was updated successfully, but these errors were encountered:
UCP/RMA: complete failed RMA-GET request
d77b0c8
fixes openucx#1783
da740e5
UCP/RMA: complete failed RMA request
3b8bee2
35c0d01
evgeny-leksikov
Successfully merging a pull request may close this issue.
After failure of RMA operation, the following is happening (this is a following of another issue #1770 ):
17:45:00 [ RUN ] dcx/test_ucp_rma.nonblocking_stream_get_nbi_flush_ep/1
17:45:01 [1503585901.060628] [hpc-test-node2:40481:3] ib_device.c:171 UCX ERROR IB Async event on mlx5_0: DCT access error on DCTN 0x9374
17:45:01 [1503585901.143067] [hpc-test-node2:40481:4] ib_mlx5_log.c:109 UCX ERROR Error on QP 0x9368 wqe[1]: Invalid request (synd 0x12 vend 0x8a) opcode RDMA_READ
17:45:01 [1503585901.143149] [hpc-test-node2:40481:4] ucp_worker.c:399 UCX ERROR Error Endpoint timeout was not handled for ep 0x35e5310
17:45:01 /scrap/jenkins/scrap/workspace/hpc-ucx-pr-4/label/hpc-test-node2/worker/2/contrib/../test/gtest/ucp/ucp_test.cc:378: Failure
17:45:01 Error: Endpoint timeout
17:45:01 [1503585901.272351] [hpc-test-node2:40481:4] mpool.c:38 UCX WARN object 0x313ef40 was not returned to mpool ucp_requests
...
17:45:01 [1503585901.413624] [hpc-test-node2:40481:4] rcache.c:284 UCX WARN mlx5_0: destroying inuse region 0x35e52b0 [0x7ffff01bd000..0x7ffff045d000] gt- rw ref 1 lkey 0x10451 rkey 0x10451 atomic: lkey 0x1b05 rkey 0x1b05
17:45:01 /scrap/jenkins/scrap/workspace/hpc-ucx-pr-4/label/hpc-test-node2/worker/2/contrib/../test/gtest/common/test.cc:228: Failure
17:45:01 Failed
17:45:01 Got 49 warnings during the test
17:45:01
The text was updated successfully, but these errors were encountered: