Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/DC: Fix DCI pending allocation scheme #2740

Merged
merged 1 commit into from
Jul 20, 2018

Conversation

brminich
Copy link
Contributor

@brminich brminich commented Jul 19, 2018

Fix for a problem found with application.
The application crashed due to the following assertion:
dc_ep.c:136 Assertion `ep->dci == UCT_DC_EP_NO_DCI' failed

The problem happened during the following flow:

  1. TX resources are exhausted
  2. Some operation is added to the pending queue (waiting for DCI allocation)
  3. Another operation has been invoked for the same ep (which forced ep to take DCI, but it was not descheduled from DCI allocation pending queue)
  4. The assertion happened

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/4938/ for details.

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/7344/ for details (Mellanox internal link).

@yosefe yosefe added the Bugfix label Jul 20, 2018
@yosefe yosefe merged commit 8b78dd5 into openucx:master Jul 20, 2018
@yosefe
Copy link
Contributor

yosefe commented Jul 20, 2018

@brminich can you pls add this to v1.4 and v1.3 branches as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants