Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.x communicator code updates #2215

Merged
merged 13 commits into from
Oct 31, 2016
Merged

v2.x communicator code updates #2215

merged 13 commits into from
Oct 31, 2016

Commits on Oct 12, 2016

  1. Remove an apparently useless function.

    (cherry picked from commit 7397276)
    Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
    bosilca authored and hjelmn committed Oct 12, 2016
    Configuration menu
    Copy the full SHA
    1f21f54 View commit details
    Browse the repository at this point in the history
  2. ompi/comm: refactor communicator cid code

    This commit simplifies the communicator context ID generation by
    removing the blocking code. The high level calls: ompi_comm_nextcid
    and ompi_comm_activate remain but now call the non-blocking variants
    and wait on the resulting request. This was done to remove the
    parallel paths for context ID generation in preperation for further
    improvements of the CID generation code.
    
    Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
    
    (cherry picked from commit 035c2e2)
    
    Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
    hjelmn committed Oct 12, 2016
    Configuration menu
    Copy the full SHA
    d52a2d0 View commit details
    Browse the repository at this point in the history
  3. ompi/comm: improve comm_split_type scalability

    This commit introduces a new algorithm for MPI_Comm_split_type. The
    old algorithm performed an allgather on the communicator to decide
    which processes were part of the new communicators. This does not
    scale well in either time or memory.
    
    The new algorithm performs a couple of all reductions to determine the
    global parameters of the MPI_Comm_split_type call. If any rank gives
    an inconsistent split_type (as defined by the standard) an error is
    returned without proceeding further. The algorithm then creates a
    communicator with all the ranks that match the split_type (no
    communication required) in the same order as the original
    communicator. It then does an allgather on the new communicator (which
    should be much smaller) to determine 1) if the new communicator is in
    the correct order, and 2) if any ranks in the new communicator
    supplied MPI_UNDEFINED as the split_type. If either of these
    conditions are detected the new communicator is split using
    ompi_comm_split and the intermediate communicator is freed.
    
    Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
    (cherry picked from commit 4c49c42)
    Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
    hjelmn committed Oct 12, 2016
    Configuration menu
    Copy the full SHA
    91337bf View commit details
    Browse the repository at this point in the history
  4. Silence warnings

    (cherry picked from commit 36a9063)
    Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
    Ralph Castain authored and hjelmn committed Oct 12, 2016
    Configuration menu
    Copy the full SHA
    3b06c89 View commit details
    Browse the repository at this point in the history
  5. Remove a debug print in comm_cid.c

    Back-ported from 01a653d
    
    Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
    hjelmn committed Oct 12, 2016
    Configuration menu
    Copy the full SHA
    b8c9f13 View commit details
    Browse the repository at this point in the history
  6. ompi/communicator: remove an other debug print statement in ompi_comm…

    …_allreduce_intra_pmix_nb()
    
    (cherry picked from commit bbc6d4b)
    Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
    ggouaillardet authored and hjelmn committed Oct 12, 2016
    Configuration menu
    Copy the full SHA
    c7d2e47 View commit details
    Browse the repository at this point in the history
  7. Remove forced debugs

    (cherry picked from commit ba77d9b)
    Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
    Ralph Castain authored and hjelmn committed Oct 12, 2016
    Configuration menu
    Copy the full SHA
    63a8c22 View commit details
    Browse the repository at this point in the history
  8. Fix typo calling allreduce with the allgather module.

    That was causing CUDA collective to crash.
    
    (cherry picked from commit 61e900e)
    Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
    sjeaugey authored and hjelmn committed Oct 12, 2016
    Configuration menu
    Copy the full SHA
    b56417b View commit details
    Browse the repository at this point in the history
  9. comm/cid: fix threaded CID allocation

    This commit should restore the pre-non-blocking behavior of the CID
    allocator when threads are used. There are two primary changes: 1)
    do not hold the cid allocator lock past the end of a request callback,
    and 2) if a lower id communicator is detected during CID allocation
    back off and let the lower id communicator finish before continuing.
    
    Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
    (cherry picked from commit fbbf743)
    Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
    hjelmn committed Oct 12, 2016
    Configuration menu
    Copy the full SHA
    a0d8715 View commit details
    Browse the repository at this point in the history
  10. comm/cid: use ibcast to distribute result in intercomm case

    This commit updates the intercomm allgather to do a local comm bcast
    as the final step. This should resolve a hang seen in intercomm
    tests.
    
    Signed-off-by: Nathan Hjelm <hjelmn@me.com>
    (cherry picked from commit 54cc829)
    Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
    hjelmn committed Oct 12, 2016
    Configuration menu
    Copy the full SHA
    59a327e View commit details
    Browse the repository at this point in the history
  11. ompi/communicator: fix typos in CID generation

    use MPI_MIN instead of MPI_MAX when appropriate, otherwise
    a currently used CID can be reused, and bad things will likely happen.
    
    Refs open-mpi#2061
    
    (cherry picked from commit 3b968ec)
    Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
    ggouaillardet authored and hjelmn committed Oct 12, 2016
    Configuration menu
    Copy the full SHA
    b8b5c31 View commit details
    Browse the repository at this point in the history
  12. Correctly indent the code.

    (cherry picked from commit 803897a)
    Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
    bosilca authored and hjelmn committed Oct 12, 2016
    Configuration menu
    Copy the full SHA
    085c8ae View commit details
    Browse the repository at this point in the history
  13. ompi/communicator: silence warnings

    (cherry picked from commit 6c6e35b)
    Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
    ggouaillardet authored and hjelmn committed Oct 12, 2016
    Configuration menu
    Copy the full SHA
    a7b8d16 View commit details
    Browse the repository at this point in the history