Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format specifiers for MPI types #107

Open
jdinan opened this issue Sep 20, 2018 · 25 comments
Open

Format specifiers for MPI types #107

jdinan opened this issue Sep 20, 2018 · 25 comments
Assignees
Labels
mpi-5 For inclusion in the MPI 5.0 standard wg-languages Languages Working Group

Comments

@jdinan
Copy link

jdinan commented Sep 20, 2018

Problem

It's not easy to perform I/O on MPI_Count, e.g. with printf or scanf.

Proposal

Similar to inttypes.h, add MPI_PRI_COUNT and MPI_SCN_COUNT format specifiers to mpi.h.

Changes to the Text

TBD

Impact on Implementations

Should be limited to header files.

Impact on Users

Users don't need to figure out the format specifier based on the size and signedness of the type.

References

@jdinan
Copy link
Author

jdinan commented Sep 20, 2018

Comment from discussion on 9/20/2018: This also raises a question about interoperability between MPI and C library routines that operate on C standard types (e.g. printf, scanf, etc.). Being able to specify format specifiers indicates that there is a correspondence between the MPI type and a C standard type.

@jdinan
Copy link
Author

jdinan commented Sep 20, 2018

@mahermanns and @dholmes-epcc-ed-ac-uk Thanks for volunteering to further discussion on this ticket.

@dholmes-epcc-ed-ac-uk
Copy link
Member

dholmes-epcc-ed-ac-uk commented Sep 20, 2018

That is the right question (Bill, Sept 2018)

Should MPI replace MPI_COUNT with size_t in all C API definitions, and with whatever native Fortran datatype is natural for the intended usage in each situation in all Fortran API definitions?

Should MPI replace MPI_AINT with ptrdiff_t in all C API definitions, and with whatever native Fortran datatype is natural for the intended usage in each situation (which may not exist in all versions of Fortran!) in all Fortran API definitions?

The consequences of this counter-proposal are that no such format specifiers are needed, and the arithmetic operators MPI_AINT_ADD and MPI_AINT_DIFF are no longer needed, and the Big MPI proposal is no longer needed (as currently specified), and <other benefits>.

@bosilca
Copy link
Member

bosilca commented Sep 20, 2018

MPI_Aint to ptrdiff_t would be more accurate. But otherwise +1.

@dholmes-epcc-ed-ac-uk
Copy link
Member

Thanks @bosilca - I knew that such a type must exist but could not think of the type name at the time I wrote the comment.

@jeffhammond
Copy link
Member

@jdinan Is it really that hard? We know from MPI-3.1 Section 2.5.8 that MPI_Count must be signed, because

it must be minimally 16 capable of encoding any value that may be stored in a variable of type int

so one should only need to verify that off_t and ptrdiff_t are the same size and then use %zd or PRI64d.

In any case, I fail to see any utility in truncating words in MPI_PRI_COUNT and MPI_SCN_COUNT. Just use MPI_PRINT_COUNT and MPI_SCAN_COUNT. The result is significantly more readable and adds only 3 bytes to the size of mpi.h.

@dholmes-epcc-ed-ac-uk
Copy link
Member

@jeffhammond given that these format specifiers only apply to the printf and scanf functions (with variants, such as vsprintf?) then we should probably include that extra F to make it 20% clearer:
MPI_PRINTF_COUNT
MPI_SCANF_COUNT

Dumb question: will these ever be different to each other? Do we need two/both?

What is the Fortran equivalent? The "I" descriptor seems old, i.e. F77 era.

@jeffhammond
Copy link
Member

@dholmes-epcc-ed-ac-uk Fortran does not standardize a preprocessor so it doesn't really matter.

@jdinan
Copy link
Author

jdinan commented Sep 25, 2018

These should follow the convention used in inttypes.h for print and scan format specifiers. These can be used in any of the functions in the printf and scanf family (see the link above for info on the inttypes header).

@jeffhammond Yes, it really is this hard if you want portability. In C, the standard integer type binary format is implementation defined, but the fixed width integer types must be two's complement. It is therefore possible to have two different signed integer representations and a user will not know which one should be used with MPI_Count.

@jdinan
Copy link
Author

jdinan commented Sep 25, 2018

We can't use C size_t and ptrdiff_t because of heterogeneity support and language interoperability.

@jeffhammond
Copy link
Member

jeffhammond commented Sep 25, 2018

@jdinan This should be fixed in C20/C++20.

We could also just preemptively stipulate that the MPI standard requires two's complement integers because there are literally no system outside of Unisys supports anything else and then only in the context of FPGA emulation of legacy code that can't be migrated to x86_64 (see aforementioned documents for details).

@mhoemmen
Copy link

mhoemmen commented Sep 25, 2018

@jeffhammond FYI if you want the latest version of a paper, use the wg21.link/p0907 link; it automatically resolves to the most recent submitted version. P0907 is on R3 now. Also it's been forwarded to Core, but I'm not sure of current status for C++20.

@mahermanns
Copy link
Member

I think using size_t and ptrdiff_t in the API is a different discussion.

I think as MPI introduces the typedef, it should also be MPI defining the format specifier (apart from how difficult it is or whether it is possible at all).

Using the PRI abbreviation would follow the principle of least astonishment. However, as we are diverting from the original naming anyway (with the second underscore and all uppercase), it may indeed be better to expand the names to MPI_PRINT_COUNT and MPI_SCAN_COUNT (I am also not a friend of abbreviating variable names unnecessarily). Then again, naming them MPI_PRI_COUNT and MPI_SCN_COUNT may set them apart enough from other MPI constants to foster intuitive recognition.

@dholmes-epcc-ed-ac-uk
Copy link
Member

@jdinan has this problem gone away? (I know that the answer has to be "no" because no changes have been made to address it, but no-one has commented on this issue since 2018 so it obviously particularly pressing.)

Is there still interest in doing something about this for the mpi-4.0 release? If so, the clock is ticking rapidly.

@wesbland wesbland added no-wg Discussion doesn't have a current working group and removed wg-large-counts Large Counts Working Group labels Nov 18, 2020
@jdinan
Copy link
Author

jdinan commented Jan 5, 2021

@dholmes-epcc-ed-ac-uk No, this hasn't been fixed. This issue could be a good first proposal for any Forum members that are looking to get their feet wet introducing a new proposal to the MPI Forum.

@wesbland wesbland added wg-languages Languages Working Group mpi-4.1 For inclusion in the MPI 4.1 standard and removed no-wg Discussion doesn't have a current working group mpi <next> labels Jul 21, 2021
@raffenet
Copy link

Just as reference, MPICH has provided these (in mpi.h) for some time.

/* FIXME: The following two definition are not defined by MPI and must not be
   included in the mpi.h file, as the MPI namespace is reserved to the MPI
   standard */
#define MPI_AINT_FMT_DEC_SPEC "%ld"
#define MPI_AINT_FMT_HEX_SPEC "%lx"

@raffenet
Copy link

Just as reference, MPICH has provided these (in mpi.h) for some time.

/* FIXME: The following two definition are not defined by MPI and must not be
   included in the mpi.h file, as the MPI namespace is reserved to the MPI
   standard */
#define MPI_AINT_FMT_DEC_SPEC "%ld"
#define MPI_AINT_FMT_HEX_SPEC "%lx"

Note the actual specifiers are filled in by configure.

@wesbland
Copy link
Member

I’m going to propose moving this to MPI 5.0. There’s more discussion to be had here. If someone objects and thinks we’ll be ready to read this soon, leave a comment and we can discuss bringing it back into MPI 4.1.

@wesbland wesbland added mpi-5 For inclusion in the MPI 5.0 standard and removed mpi-4.1 For inclusion in the MPI 4.1 standard labels Jul 14, 2022
@jdinan
Copy link
Author

jdinan commented Sep 30, 2022

To folks that have asked, no the problem has not gone away since users don't know to which integral C type a given MPI integer type maps. If another Forum member has cycles to pick up this issue (should be a relatively easy one), please feel free to do so.

@jeffhammond
Copy link
Member

Can one just default to %llu and promote it, if it's not 64b?

@jdinan
Copy link
Author

jdinan commented Oct 24, 2022

If you used the same approach with scanf, it would be difficult to detect whether the value is truncated.

@jeffhammond
Copy link
Member

jeffhammond commented Jan 2, 2023

One could also determine the size of an integer using sizeof and whether it is signed using this.

With C++, it seems straightforward to deduce the printf formats from typeid. See below.

One can write something similar with the GNU C extension typeof, which is expected to be in C23. I assume there is a way to do it with _Generic as well, but I haven't tried.

As for scanf, I would expect binary file I/O to need to store the type information if one cannot assume 64-bit values.

#include <typeinfo>
#include <iostream>
#include <string>

#include <mpi.h>

int main(void)
{
    MPI_Count  c = 5;
    MPI_Aint   a = 6;
    MPI_Offset o = 7;

    std::string ff{"C=%"+std::string{typeid(MPI_Count).name()}+"\n"};
    printf(ff.c_str(),c);

    std::string gg{"A=%"+std::string{typeid(MPI_Aint).name()}+( std::is_signed<MPI_Aint>() ? "d" : "u")+"\n"};
    printf(gg.c_str(),a);

    std::string hh{"O=%"+std::string{typeid(MPI_Offset).name()}+"\n"};
    printf(hh.c_str(),o);

    return 0;
}

@mhoemmen
Copy link

mhoemmen commented Jan 2, 2023

@jeffhammond wrote:

With C++, it seems straightforward to deduce the printf formats from typeid. See below.

In C++23, I would use std::print, and in C++20, I would use std::format. These Solve the Problem without you needing to know the printf format specifier. If you need to support earlier C++ versions, you can use the {fmt} library. (C++20 and C++23 standardized these parts of the {fmt} library.)

If I had to use printf, Jeff's typeid-based approach works, but please note that the result of std::type_info::name() is mangled and not standard. GCC offers a demangling function ( https://gcc.gnu.org/onlinedocs/libstdc++/manual/ext_demangling.html ); other compilers probably also do that.

Can one just default to %llu and promote it, if it's not 64b?

If it's actually a pointer, reinterpret_cast<ptrdiff_t>(p) would get you a signed integer, in which case I would use t instead of ll.

Please don't use intmax_t (see e.g., https://thephd.dev/intmax_t-hell-c++-c ).

@jeffhammond
Copy link
Member

We might end up standardizing the C type of these types for the ABI, so maybe it won't be so bad in the future.

@jeffhammond
Copy link
Member

I withdraw my prior objections to this proposal. We should do this, and it is especially important for MPI_Count, because it is likely going to be the wider of intptr_t and int64_t and thus it's going to be annoying for users to printf these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mpi-5 For inclusion in the MPI 5.0 standard wg-languages Languages Working Group
Projects
Status: To Do
Development

No branches or pull requests