Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in error handling flow after RMA op failure #1783

Closed
evgeny-leksikov opened this issue Aug 24, 2017 · 0 comments · Fixed by #2052
Closed

Memory leak in error handling flow after RMA op failure #1783

evgeny-leksikov opened this issue Aug 24, 2017 · 0 comments · Fixed by #2052

Comments

@evgeny-leksikov
Copy link
Contributor

After failure of RMA operation, the following is happening (this is a following of another issue #1770 ):

17:45:00 [ RUN ] dcx/test_ucp_rma.nonblocking_stream_get_nbi_flush_ep/1
17:45:01 [1503585901.060628] [hpc-test-node2:40481:3] ib_device.c:171 UCX ERROR IB Async event on mlx5_0: DCT access error on DCTN 0x9374
17:45:01 [1503585901.143067] [hpc-test-node2:40481:4] ib_mlx5_log.c:109 UCX ERROR Error on QP 0x9368 wqe[1]: Invalid request (synd 0x12 vend 0x8a) opcode RDMA_READ
17:45:01 [1503585901.143149] [hpc-test-node2:40481:4] ucp_worker.c:399 UCX ERROR Error Endpoint timeout was not handled for ep 0x35e5310
17:45:01 /scrap/jenkins/scrap/workspace/hpc-ucx-pr-4/label/hpc-test-node2/worker/2/contrib/../test/gtest/ucp/ucp_test.cc:378: Failure
17:45:01 Error: Endpoint timeout
17:45:01 [1503585901.272351] [hpc-test-node2:40481:4] mpool.c:38 UCX WARN object 0x313ef40 was not returned to mpool ucp_requests
...
17:45:01 [1503585901.413624] [hpc-test-node2:40481:4] rcache.c:284 UCX WARN mlx5_0: destroying inuse region 0x35e52b0 [0x7ffff01bd000..0x7ffff045d000] gt- rw ref 1 lkey 0x10451 rkey 0x10451 atomic: lkey 0x1b05 rkey 0x1b05
17:45:01 /scrap/jenkins/scrap/workspace/hpc-ucx-pr-4/label/hpc-test-node2/worker/2/contrib/../test/gtest/common/test.cc:228: Failure
17:45:01 Failed
17:45:01 Got 49 warnings during the test
17:45:01

@alinask alinask added the Bug label Aug 28, 2017
@evgeny-leksikov evgeny-leksikov self-assigned this Dec 7, 2017
evgeny-leksikov added a commit to evgeny-leksikov/ucx that referenced this issue Dec 8, 2017
evgeny-leksikov added a commit to evgeny-leksikov/ucx that referenced this issue Dec 8, 2017
evgeny-leksikov added a commit to evgeny-leksikov/ucx that referenced this issue Dec 8, 2017
evgeny-leksikov added a commit to evgeny-leksikov/ucx that referenced this issue Dec 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants