Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: Persistent Point-to-Point Collectives #4

Open
trey-ornl opened this issue May 31, 2023 · 3 comments
Open

Idea: Persistent Point-to-Point Collectives #4

trey-ornl opened this issue May 31, 2023 · 3 comments

Comments

@trey-ornl
Copy link

I propose the following idea for a future MPI standard: persistent point-to-point collectives. The goal is to provide a flexible interface for pre-defining nearest-neighbor-like communication that allows an MPI implementation to pay most setup costs at request-creation time and to perform the communication pattern more efficiently.

Here is a straw-man API.

MPI_SEND_ADD(buf, count, datatype, dest, tag, request)
Adds a non-blocking send operation to a persistent request. Multiple operations can be added to the same request. This call would be local.

MPI_RECV_ADD(buf, count, datatype, source, tag, request)
Adds a non-blocking recv operation to a persistent request. Multiple operations can be added to the same request. This call would be local.

MPI_REQUEST_INIT(comm, request)
Makes a persistent point-to-point collective request available for use with MPI_START and MPI_WAIT. The resulting request would function like a persistent collective request. This call should come after all the ADD calls. It would be collective across the communicator.

A single persistent point-to-point collective request with MPI_START and MPI_WAIT would behave like the analogous array of persistent point-to-point requests with MPI_STARTALL and MPI_WAITALL, but with the following restrictions.

  • The destinations, sources, and tags of the sends and receives would all be required to match globally at INIT time.
  • The MPI_STATUS returned by MPI_WAIT would only support fields supported by other persistent collectives.

Then the following optimizations could all happen at INIT time.

  • Global matchings of sends and receives.
  • Registration of buffers for RDMA.
  • Allocation of resources for efficient synchronization and data transfers.

This API could have the following advantages over existing non-blocking and persistent point-to-point communication.

  • Better communication performance.
  • The potential to check for deadlock or mismatched messages at INIT time.

This API could have the following advantages over persistent neighborhood collectives, while maintaining similar opportunity for performance.

  • Simpler-to-understand construction of requests, particularly when refactoring existing point-to-point code. It would support building a request out of familiar sends and receives instead of topology constructors.
  • The flexibility to use multiple buffers, instead of requiring single send and receive buffers.
  • No need to create a new communicator, thus avoiding the potential consumption of limited resources that a separate communicator might require.

Please forgive me if the MPI Forum has already investigated similar ideas.

@trey-ornl
Copy link
Author

Patrick Bridges pointed out to me that this idea is similar to point-to-point communication in NCCL, but with persistence.

@patrickb314
Copy link

Trey, how would this compare to MPI neighbor collectives, particularly if creating new topologies on which to communicate didn't require communicator creation?

@trey-ornl
Copy link
Author

If persistent neighborhood collectives could take a topology instead of a communicator, I think that two of the three advantages of this proposal still remain.

  • Simpler-to-understand construction of requests, particularly when refactoring existing point-to-point code. It would support building a request out of familiar sends and receives instead of topology constructors.
  • The flexibility to use multiple buffers, instead of requiring single send and receive buffers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants