Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/DC: Fix DCI pending allocation scheme - v1.3.x #2744

Merged
merged 1 commit into from
Jul 27, 2018

Conversation

brminich
Copy link
Contributor

@brminich brminich commented Jul 23, 2018

Fix for a problem found with application.
The application crashed due to the following assertion:
dc_ep.c:136 Assertion `ep->dci == UCT_DC_EP_NO_DCI' failed

The problem happened during the following flow:

  1. TX resources are exhausted
  2. Some operation is added to the pending queue (waiting for DCI allocation)
  3. Another operation has been invoked for the same ep (which forced ep to take DCI, but it was not descheduled from DCI allocation pending queue)
  4. The assertion happened

Picked from master #2740

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/4943/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/7351/ for details (Mellanox internal link).

@brminich
Copy link
Contributor Author

The issue looks unrelated - submitted #2746

bot:mlx:retest

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/7353/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/7354/ for details (Mellanox internal link).

@brminich
Copy link
Contributor Author

@hoopoepg, plz review this one as well.

@yosefe yosefe merged commit e88e401 into openucx:v1.3.x Jul 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants