Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpi: embiggening #4961

Merged
merged 11 commits into from
Feb 20, 2021
Merged

mpi: embiggening #4961

merged 11 commits into from
Feb 20, 2021

Conversation

hzhou
Copy link
Contributor

@hzhou hzhou commented Dec 17, 2020

Pull Request Description

This is a work-in-progress to implement the MPI 4.0 large count.

[skip warnings]

Expected Impact

Author Checklist

  • Reference appropriate issues (with "Fixes" or "See" as appropriate)
  • Remove xfail from the test suite when fixing a test
  • Commits are self-contained and do not do two things at once
  • Commit message is of the form: module: short description and follows good practice
  • Passes whitespace checkers
  • Passes warning tests
  • Passes all tests
  • Add comments such that someone without knowledge of the code could understand
  • You or your company has a signed contributor's agreement on file with Argonne
  • For non-Argonne authors, request an explicit comment from your companies PR approval manager

@hzhou
Copy link
Contributor Author

hzhou commented Dec 17, 2020

Special warning test to warn about integer conversion (-Wconversion)

test:mpich/warnings/int

@hzhou hzhou force-pushed the 2012_embiggening branch 6 times, most recently from 7356a27 to 0134c73 Compare December 17, 2020 05:56
@hzhou
Copy link
Contributor Author

hzhou commented Dec 17, 2020

test:mpich/warnings/int

1 similar comment
@hzhou
Copy link
Contributor Author

hzhou commented Dec 17, 2020

test:mpich/warnings/int

@hzhou hzhou force-pushed the 2012_embiggening branch 10 times, most recently from 2f8c1fa to 664cb1b Compare December 26, 2020 02:42
raffenet
raffenet previously approved these changes Jan 13, 2021
Copy link
Contributor

@raffenet raffenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me.

maint/gen_binding_c.py Outdated Show resolved Hide resolved
@hzhou
Copy link
Contributor Author

hzhou commented Jan 13, 2021

Looks OK to me.

Thanks. I'll split some of the warnings change for you to review and approve again. The embiggening is not done yet.

@hzhou hzhou force-pushed the 2012_embiggening branch 4 times, most recently from c6aabbc to 41ea836 Compare February 17, 2021 03:03
@hzhou hzhou marked this pull request as ready for review February 17, 2021 03:05
@hzhou hzhou force-pushed the 2012_embiggening branch 6 times, most recently from de2d464 to e2bddb9 Compare February 18, 2021 00:38
@hzhou hzhou dismissed raffenet’s stale review February 18, 2021 01:20

significant changes

@hzhou
Copy link
Contributor Author

hzhou commented Feb 18, 2021

test:mpich/ch3/most
test:mpich/ch4/most

Copy link
Contributor

@raffenet raffenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see much to object to here. I'm a bit confused by the commit I commented on, both otherwise seems OK.

void MPIR_Type_free_impl(MPI_Datatype * datatype);
void MPIR_Pack_size_impl(int incount, MPI_Datatype datatype, MPI_Aint * size);
void MPIR_Pack_size(MPI_Aint incount, MPI_Datatype datatype, MPI_Aint * size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure exactly what's happening in this commit. It seems datatype _impl functions are becoming wrappers where they weren't before.
For example:
(old) MPI_Pack_size -> MPIR_Pack_size_impl
(new) MPI_Pack_size -> MPIR_Pack_size -> MPIR_Pack_size_impl
Is this so small and large (_c) versions can be treated differently before calling the _impl API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure exactly what's happening in this commit. It seems datatype _impl functions are becoming wrappers where they weren't before.
For example:
(old) MPI_Pack_size -> MPIR_Pack_size_impl
(new) MPI_Pack_size -> MPIR_Pack_size -> MPIR_Pack_size_impl
Is this so small and large (_c) versions can be treated differently before calling the _impl API?

In the large branch we may swap the array or may not, depending on whether MPI_Aint is the same size as MPI_Count. When we do the swap, the parameter has to be renamed. This becomes awkward for inlined "body of routines" since to have two large_count versions with different variable names seem just silly. Thus it seems much easier to just move the inline code block into _impl wrapper, which can be handled automatically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Can you add this more detailed description to the commit message? At the moment

datatype: use MPI_Aint for internal impl functions

lacks information about the addition of the wrappers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@hzhou
Copy link
Contributor Author

hzhou commented Feb 19, 2021

test:mpich/ch4/ofi

@hzhou hzhou requested a review from raffenet February 19, 2021 21:35
Copy link
Contributor

@raffenet raffenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor nitpick.

Comment on lines 49 to 50
if (mpi_errno)
goto fn_fail;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: should this be MPI_ERR_CHECK(mpi_errno)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see this was just copy/paste from before. No need to fix now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll fix it before merging.

Need use MPI_Aint internally for buffer sizes to accommodate large
count.
This function must be a leftover from development. It has never been
used and I don't think it is complete anyway.
Need differentiate internal func attributes from the user configured
attributes.
While it is technically safe, having the same temparary name within
nested scope is confusing. Rename to clarity.
This will be used when user use a small function and the value returned
won't fit in the output parameter.
We will call dump_mpi_c twice for functions with POLY types. First we
call with map_type="SMALL", then call with map_type="BIG".
Functions without POLY types will just call dump_mpi_c once with
map_type="SMALL".

First of all, if there is a custom "large_count" block, we'll always
use that in large version.

If a POLY function only has scalar POLY type, we assume the
implementation use MPI_Aint -- need fix any exceptions.

If a POLY function has scalar output or inout POLY type, we assume
internally use MPI_Aint, and python generrates temporary MPI_Aint
variable to copy in and out value in the *small* version.

If the function uses POLY arrays, we will use separate internal routines
for datatype functions and always use MPI_Aint for collective (and other
if any) functions. The array swap can be messy due to function-specific
array lengths. We took the way out by only doing array swap in
v-collectives.

large mpir impl version with `_c_impl` suffix, and will call that. The
compilation will fail if that (_c_impl) routine is missing.

When we only use MPI_Aint for internal impl functions, we also check and
convert when MPI_Count is bigger than MPI_Aint (such as the case in
32-bit systems).
The parameter mapping is passed along only because a few lower level
function need it. Since the mapping concept is obscure at higher level,
having it in the interface makes the code obscrue. On the other hand, a
simpl string parameter "map_type" is more intuitive to understand. This
commit uses "map_type" or simply omits mapping whereever the mapping is
not directly needed.
... that has output counters, e.g. MPI_Get_count, MPI_Get_elements,
MPI_Pack, MPI_Unpack, etc. We need swap pointers when the type
`MPI_Count` and `MPI_Aint` are not the same size, or we may skip the
swap. The swapping will require use of different variables. This become
difficult to handle with inlined body-of-routines. Wrap them inside an
_impl function allows both cases to be handled properly.
Unfortunately our fortran bindings uses _c_impl and _c to refer to C
versions of Fortran binding, causing conflicts with the large count
interface. Since it is easier to change the C bindings, we use
_large_impl to differentiate. I think it is more readable than _c suffix
anyway.
@hzhou
Copy link
Contributor Author

hzhou commented Feb 20, 2021

test:mpich/ch3/most
test:mpich/ch4/most

@hzhou hzhou merged commit c23e573 into pmodels:main Feb 20, 2021
@hzhou hzhou deleted the 2012_embiggening branch February 20, 2021 05:55
@hzhou
Copy link
Contributor Author

hzhou commented May 29, 2021

Reference #4880

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants