Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.0.x comm CID refactor #2381

Merged
merged 10 commits into from
Nov 9, 2016
Merged

v2.0.x comm CID refactor #2381

merged 10 commits into from
Nov 9, 2016

Conversation

hjelmn
Copy link
Member

@hjelmn hjelmn commented Nov 7, 2016

Fixes #2380

bosilca and others added 5 commits November 7, 2016 14:43
(cherry picked from commit 7397276)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit simplifies the communicator context ID generation by
removing the blocking code. The high level calls: ompi_comm_nextcid
and ompi_comm_activate remain but now call the non-blocking variants
and wait on the resulting request. This was done to remove the
parallel paths for context ID generation in preperation for further
improvements of the CID generation code.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 035c2e2)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 36a9063)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Back-ported from 01a653d

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit b8c9f13)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
…_allreduce_intra_pmix_nb()

(cherry picked from commit bbc6d4b)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Ralph Castain and others added 5 commits November 7, 2016 15:54
(cherry picked from commit ba77d9b)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit should restore the pre-non-blocking behavior of the CID
allocator when threads are used. There are two primary changes: 1)
do not hold the cid allocator lock past the end of a request callback,
and 2) if a lower id communicator is detected during CID allocation
back off and let the lower id communicator finish before continuing.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit fbbf743)
This commit updates the intercomm allgather to do a local comm bcast
as the final step. This should resolve a hang seen in intercomm
tests.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
(cherry picked from commit 54cc829)
use MPI_MIN instead of MPI_MAX when appropriate, otherwise
a currently used CID can be reused, and bad things will likely happen.

Refs open-mpi#2061

(cherry picked from commit 3b968ec)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 6c6e35b)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@jjhursey
Copy link
Member

jjhursey commented Nov 8, 2016

Might also fix #2234 - @jsquyres to check.

@jjhursey
Copy link
Member

jjhursey commented Nov 8, 2016

👍 I can confirm that this fixes Issue #2380. Thanks @hjelmn!

@jsquyres
Copy link
Member

jsquyres commented Nov 9, 2016

I confirm that this fixes COMM_SPAWN problems (#2234).

@jsquyres jsquyres merged commit cc10d66 into open-mpi:v2.0.x Nov 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants