Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coll: internally use MPI_Aint for count parameters #5044

Merged
merged 4 commits into from
Feb 12, 2021

Conversation

hzhou
Copy link
Contributor

@hzhou hzhou commented Feb 2, 2021

Pull Request Description

Make collective internally use MPI_Aint for count parameters.

[skip warnings]

Expected Impact

Author Checklist

  • Reference appropriate issues (with "Fixes" or "See" as appropriate)
  • Remove xfail from the test suite when fixing a test
  • Commits are self-contained and do not do two things at once
  • Commit message is of the form: module: short description and follows good practice
  • Passes whitespace checkers
  • Passes warning tests
  • Passes all tests
  • Add comments such that someone without knowledge of the code could understand
  • You or your company has a signed contributor's agreement on file with Argonne
  • For non-Argonne authors, request an explicit comment from your companies PR approval manager

@hzhou hzhou force-pushed the 2101_coll_large branch 2 times, most recently from 1465ff5 to 5c7c6be Compare February 2, 2021 05:50
@hzhou
Copy link
Contributor Author

hzhou commented Feb 2, 2021

test:mpich/ch3/most
test:mpich/ch4/most

@hzhou
Copy link
Contributor Author

hzhou commented Feb 2, 2021

@raffenet I am going to make the counts array argument (e.g. MPI_Gatherv) internally to use MPI_Aint) next. That potentially need more debate than this one, which simply replaces the scalar part. Would you like to review this one first or would you wait for the counts array changes and review them together?

raffenet
raffenet previously approved these changes Feb 3, 2021
Copy link
Contributor

@raffenet raffenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's review the arrays separately. This one LGTM.

@hzhou
Copy link
Contributor Author

hzhou commented Feb 3, 2021

Let's review the arrays separately. This one LGTM.

As I continue to work on counts array, I realzed that missed a ton of collective functions inside the ch4 since we have entire sets coll netmod/shmmod APIs as well. So this PR is incomplete even for just scalar count, and it reveals that without strict compiler check (which we can't due to unclean main) there is no easy way to tell whether the PR is complete. Let me continue and do the largification of counts array as well. That will help me identify all the places. Then if we decide to only do scalar, we can easily just drop the counts array commit.

@hzhou hzhou force-pushed the 2101_coll_large branch 2 times, most recently from df18568 to 639cce8 Compare February 3, 2021 17:41
@hzhou
Copy link
Contributor Author

hzhou commented Feb 3, 2021

test:mpich/ch3/most
test:mpich/ch4/most

@hzhou hzhou force-pushed the 2101_coll_large branch 2 times, most recently from a7fce7e to d5d8a68 Compare February 3, 2021 19:37
@hzhou
Copy link
Contributor Author

hzhou commented Feb 3, 2021

test:mpich/ch3/most
test:mpich/ch4/most

@hzhou
Copy link
Contributor Author

hzhou commented Feb 4, 2021

test:mpich/ch3/most
test:mpich/ch4/most

@hzhou
Copy link
Contributor Author

hzhou commented Feb 4, 2021

test:mpich/ch4/most

@hzhou
Copy link
Contributor Author

hzhou commented Feb 4, 2021

test:mpich/ch3/most

@hzhou hzhou requested a review from raffenet February 5, 2021 01:56
@hzhou hzhou dismissed raffenet’s stale review February 5, 2021 01:57

Commits significantly updated

Copy link
Contributor

@raffenet raffenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still prefer evaluating the v-collective changes separately. The rest of the PR looks pretty good.

@@ -68,9 +68,14 @@ static int win_allgather(MPIR_Win * win, size_t length, uint32_t disp_unit, void
MPIDI_UCX_CHK_STATUS(status);
}

MPI_Datatype aint_type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just use MPI_AINT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from that, I don't really understand what changed here. rkey_sizes is the data. It shouldn't have anything to do with large count interfaces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see it now. It is for the followup collective using Allgatherv. This makes sense now, but we should still just use MPI_AINT directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I suppose all this is tied to the v-collective promotion, since that is the commit that actually promotes the rkey_sizes type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see it now. It is for the followup collective using Allgatherv. This makes sense now, but we should still just use MPI_AINT directly.

We are. The problem is MPI_Aint is not a datatype, so we need create a MPI_Datatype for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MPI_AINT is a datatype.

#define MPI_AINT ((MPI_Datatype)@MPI_AINT_DATATYPE@)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Yeah, I can use MPI_AINT directly. I had a feeling I was missing something.

@hzhou
Copy link
Contributor Author

hzhou commented Feb 9, 2021

Still prefer evaluating the v-collective changes separately. The rest of the PR looks pretty good.

I can do that.

@hzhou hzhou force-pushed the 2101_coll_large branch 2 times, most recently from 7c1f947 to 8fc06c9 Compare February 10, 2021 17:48
@hzhou
Copy link
Contributor Author

hzhou commented Feb 10, 2021

test:mpich/ch3/most
test:mpich/ch4/most

@hzhou
Copy link
Contributor Author

hzhou commented Feb 10, 2021

Jenkins on mac went away. Retest:

test:mpich/ch3/sock

@hzhou
Copy link
Contributor Author

hzhou commented Feb 11, 2021

test:mpich/ch3/sock

@hzhou hzhou requested a review from raffenet February 11, 2021 06:02
@hzhou
Copy link
Contributor Author

hzhou commented Feb 12, 2021

@raffenet This PR now only contains the scalar count conversions. I'll push a separate PR with v-counts. Could you review this PR again?

The blocking neighbor collectives use slight different MPIR_Xxx
conventions than the other blocking collectives. This seems to be an
oversight rather than necessity. To facilitate large count changes,
let's generate them in python autogen.

TODO: make neighbor collective behave the same way as other collectives.
Both are automatically generated by the python scripts now. We may
replace the count parameters with large int types. It is much easy to
generate them than manually update them.
The op functions are not using large count yet.

TODO: fix this.
@hzhou hzhou merged commit 578ea61 into pmodels:main Feb 12, 2021
@hzhou hzhou deleted the 2101_coll_large branch February 12, 2021 19:18
@hzhou
Copy link
Contributor Author

hzhou commented May 29, 2021

Reference #4880

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants