Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantics of MPI Rank Parameters #138

Open
omor1 opened this issue Jun 4, 2019 · 16 comments
Open

Semantics of MPI Rank Parameters #138

omor1 opened this issue Jun 4, 2019 · 16 comments
Labels
mpi-5 For inclusion in the MPI 5.0 standard wg-terms Semantic Terms Working Group

Comments

@omor1
Copy link
Member

omor1 commented Jun 4, 2019

Problem

MPI variously defines 'rank' parameters (e.g. source, destination, root, and target) as 'integer' or 'non-negative integer', occasionally using both within the same chapter. This has no readily-apparent reason and the definition should be unified, perhaps within the context of the Semantic Terms section.

Chapters describing ranks as 'integer':

  • Point-to-Point Communication (§3)
  • Collective Communication (§5)
  • Groups, Contexts, Communicators, and Caching (§6)

Chapters describing ranks as 'non-negative integer':

  • One-Sided Communications (§11)

Chapters using both descriptions:

  • Process Topologies (§7)

The above isn't meant to be a comprehensive list, but does adequately exhibit the problem.

Proposal

Define 'rank' as an integer in the range 0 to N, where N is the size of a group or communicator. Should #87 be accepted, this definition can be expanded to include MPI_PROC_NULL.

Changes to the Text

The Semantic Terms section will need to be updated with the definition for 'rank' and all current descriptions of 'rank' parameters will be changed correspondingly.

Impact on Implementations

None

Impact on Users

None; it may be a bit more clear what values are permitted for rank parameters.

Alternatives

More conservatively, either 'integer' or 'non-negative integer' may be chosen as the definitive description of the type of a rank parameter and used throughout the standard. I recommend 'integer', since MPI_PROC_NULL and MPI_ANY_SOURCE are commonly implemented as negative integers.

@omor1
Copy link
Member Author

omor1 commented Jun 4, 2019

Note that as of this writing, #87 is rather narrow; a new version with a wider intent (to clarify where, throughout the standard, null processes are permitted) will be be written shortly and the issue updated accordingly.

@dholmes-epcc-ed-ac-uk
Copy link
Member

We should give serious consideration to the idea of adding a new MPI type: MPI_Rank.

// possible mpi.h extract
#define MPI_Rank uint64_t;
#define MPI_PROC_NULL (2**63-1);
int MPI_Send(..., MPI_Rank dest, ...);
// possible alternative mpi.h extract
#define MPI_Rank int;
#define MPI_PROC_NULL (-1);
int MPI_Send(..., MPI_Rank dest, ...);
main() {
    MPI_Rank myRank, wSize;
    MPI_Init();
    MPI_Comm_rank(MPI_COMM_WORLD, myRank);
    MPI_Comm_size(MPI_COMM_WORLD, wSize);
    if ((MPI_Rank)0 == myRank) {
        for (MPI_Rank r = (MPI_Rank)1; r < wSize; ++r) {
            MPI_Recv(..., r, ...);
        }
    } else {
        MPI_Send(..., (MPI_Rank)0, ...);
    }
    MPI_Finalize();
}

@tonyskjellum
Copy link

tonyskjellum commented Jun 4, 2019 via email

@dholmes-epcc-ed-ac-uk
Copy link
Member

@tonyskjellum
0) why clarify as a signed integer? It should be "MPI_Rank is any non-negative integer in the range from 0 to size-1 or MPI_PROC_NULL" with caveats like "Performing arithmetic operations with MPI_PROC_NULL is erroneous because it will yield undefined results."

  1. compatibility is compromised by all the casting all over the place. A code will not be portable (without conversion/truncation warnings/errors at compile-time) without proper casting, as shown in my example.
  2. yay!
  3. yay!
  4. depends on (0)
  5. re: MPI_MAX_RANK - c.f. MPI_TAB_UB - we would want to be able to query the maximum rank value and we would want to standardise the minimum number that must be supported for that maximum number of ranks. Possibly we want MPI_MAX_GROUP_SIZE or MPI_MAX_COMM_SIZE instead? MPI_MAX_RANK can change when additional MPI processes become connected, e.g. via the Dynamic Model (spawn, connect, join, etc).
  6. By setting MPI_MAX_RANK/MPI_MAX_COMM_SIZE you mean per-implementation, right? If we mandate a value in the MPI Standard then every MPI library must support up to that many and no MPI library is permitted to support more than that many.

Q: if I use MPI_COMM_CONNECT/MPI_COMM_ACCEPT to repeatedly add new MPI jobs to my universe and I keep merging the intercomms into intracomms forcing MPI to find bigger numbers for rank, what happens when I force MPI to exceed MPI_MAX_RANK/MPI_MAX_COMM_SIZE? Is it an error of class MPI_ERR_MAX_RANK/MPI_ERR_MAX_COMM_SIZE? Does MPI_MAX_RANK/MPI_MAX_COMM_SIZE depend on how much memory MPI has already used?

@jsquyres
Copy link
Member

jsquyres commented Jun 4, 2019

If your goal is to have sizeof(MPI_Rank) > sizeof(int) someday, then if #137 (BigCount) is accepted into MPI-4, you're going to have a nightmare of overloaded bindings -- particularly in C.

Remember: one of the key things of making the C11 _Generic workable is that all "count" arguments will be int or all "count" arguments will be MPI_Count -- we're not supporting multiple types of "count" argument in the same function call. If you need to overload another parameter type -- MPI_Rank, you're making a nightmare for C11 _Generic, and indeed, you're even making life complicated for Fortran and C++.

For example, here's the "simple" case of C++ function overloading for MPI_Accumulate (even abiding by the "all 'count' params will be the same type" restriction):

MPI_Accumulate(const void *origin_addr, int origin_count, MPI_Datatype origin_datatype, int target_rank, MPI_Aint target_disp, int target_count, MPI_Datatype target_datatype, MPI_Op op, MPI_Win win);
MPI_Accumulate(const void *origin_addr, MPI_Count origin_count, MPI_Datatype origin_datatype, int target_rank, MPI_Aint target_disp, MPI_Count target_count, MPI_Datatype target_datatype, MPI_Op op, MPI_Win win);
MPI_Accumulate(const void *origin_addr, int origin_count, MPI_Datatype origin_datatype, MPI_Rank target_rank, MPI_Aint target_disp, int target_count, MPI_Datatype target_datatype, MPI_Op op, MPI_Win win);
MPI_Accumulate(const void *origin_addr, MPI_Count origin_count, MPI_Datatype origin_datatype, MPI_Rank target_rank, MPI_Aint target_disp, MPI_Count target_count, MPI_Datatype target_datatype, MPI_Op op, MPI_Win win);

Things get dicey with C11 _Generic -- I don't know exactly what that would look like, but I suspect you would have to nest them...? 🤷‍♂

@dholmes-epcc-ed-ac-uk
Copy link
Member

@jsquyres this is precisely my reason for bring up this idea (again, this is not its first outing/rodeo). This is the combinatorial problem that drove us towards the function pointers interface.

This is also the only opportunity we have to say "new binding or old binding; make your choice". Old = like now. New = MPI_COUNT and MPI_RANK for all appropriate parameters. No mixing. That is, we only support two backend signatures: your first one and your last one.

@dholmes-epcc-ed-ac-uk
Copy link
Member

Also, anything that currently takes int but is actually asking for a displacement or an offset should be upgraded to MPI_AINT in the same move. Still one new backend signature.

@omor1
Copy link
Member Author

omor1 commented Jun 4, 2019

@jsquyres it's possible to nest _Generic, but it's very clumsy and clunky.

void foo_int_int(int X, int Y);
void foo_int_long(int X, long Y);
void foo_long_int(long X, int Y);
void foo_long_long(long X, long Y);

#define foo_long(X, Y) _Generic((Y), \
        long:    foo_long_long,      \
        int:     foo_long_int,       \
        default: foo_long_int        \
    )(X, Y)

#define foo_int(X, Y) _Generic((Y), \
        long:    foo_int_long,      \
        int:     foo_int_int,       \
        default: foo_int_int        \
    )(X, Y)

#define foo(X, Y) _Generic((X),  \
        long:    foo_long(X, Y), \
        int:     foo_int(X, Y),  \
        default: foo_int(X, Y)   \
    )

@omor1
Copy link
Member Author

omor1 commented Jun 4, 2019

Also note that that whether MPI should have MPI_Rank should be in a different issue; this proposal is meant to be very narrow in scope.

@tonyskjellum
Copy link

tonyskjellum commented Jun 4, 2019 via email

@hjelmn
Copy link

hjelmn commented Jun 4, 2019

@jsquyres This is one of the reasons it might just be time to break API. Leave MPI-4.x alone and define all new functions going forward as using MPI_Rank for the rank and MPI_Count for counts. Anything else is a hack at best.

@tonyskjellum
Copy link

tonyskjellum commented Jun 4, 2019 via email

@jeffhammond
Copy link
Member

MPI_Rank is a ridiculous idea and completely irrelevant to the very small and obvious clarification of what a rank is. Please do not burden this ticket with addressing this.

We are nowhere near needing to have a rank that is larger than INT_MAX. The largest MPI simulation in history is less than 8M ranks (sequoia-ross-pads-2013.pdf) and nobody is talking about building machines that will exceed that (rather, the trend is the opposite).

@omor1
Copy link
Member Author

omor1 commented Jun 4, 2019

@jeffhammond I can see the utility of specifying a rank type separately from some general integer from an API and application maintainability perspective, but I entirely agree that should be discussed in a different ticket (especially since it will undoubtedly prove to be incredibly controversial). I doubt anyone is seriously arguing that we need support for 64-bit ranks, other than from the perspective of being 64-bit clean. Even the cluster with the most cores (Sunway TiahuLight) has several orders of magnitude less cores than supported by 32-bit signed integers. Of course, int could be only 16 bits on some system, but I’m unaware of any modern platform that does so.

@jsquyres
Copy link
Member

jsquyres commented Jun 5, 2019

I don't really have an opinion here on MPI_Rank in particular (although I agree that there is no machine on the roadmap today that will have more than 2B MPI processes).

My understanding of the rationale of this proposal is twofold:

  1. Do it now (i.e., MPI-4) when BigCount is actually likely going to happen -- doing it later will create a C11 _Generic hot mess.
  2. The reason to have MPI_Rank (and MPI_Tag? and MPI_Displacement? and ...?) is to divorce the MPI types from the underlying language, thereby paving the way for supporting more languages in the future (e.g., see slides 38-43 in https://github.com/mpi-forum/mpi-forum.github.io/blob/master/slides/2019/05/2019-05-30-BigCount-solutions-for-MPI-4.pdf)

@omor1
Copy link
Member Author

omor1 commented Jun 5, 2019

In any case, discussion on MPI_Rank should be put on another issue once a concrete proposal is submitted; it's not what this issue is about, a minor incongruence in the standard. Let's refocus on that.

@wesbland wesbland added wg-terms Semantic Terms Working Group mpi-5 For inclusion in the MPI 5.0 standard labels Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mpi-5 For inclusion in the MPI 5.0 standard wg-terms Semantic Terms Working Group
Projects
Status: To Do
Development

No branches or pull requests

7 participants