Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osc/rdma: Fix MPI_Win_start()/complete() with MPI_GROUP_EMPTY. #8779

Merged
merged 1 commit into from
Apr 30, 2021

Conversation

awlauria
Copy link
Contributor

@awlauria awlauria commented Apr 6, 2021

  • Make sure the epoch type is set before returning from MPI_Win_start().
  • Make sure the group is only free'd if it is valid in MPI_Win_complete().
    • Fix possible double free() of the group.

Fixes #8677

Signed-off-by: Austen Lauria awlauria@us.ibm.com

@awlauria
Copy link
Contributor Author

@hjelmn ping - can you take a quick look when you have some cycles. Thanks!

@dalcinl
Copy link
Contributor

dalcinl commented Apr 22, 2021

@gpaulsen maybe you can ask someone else to review this PR? The fixes are rather trivial.

@awlauria
Copy link
Contributor Author

@devreal / @bosilca do you mind giving this a quick look? This bug is blocking @dalcinl .

@bosilca
Copy link
Member

bosilca commented Apr 22, 2021

The asymmetry proposed by the patch looks strange to me. Basically you always start an epoch but only increase the refcount on the group when there are processes in the group (group of size zero and not the MPI_GROUP_EMPTY). On the other side you release the refcount if the group is not MPI_GROUP_EMPTY, which indicate that 0-process groups will see their refcount reduced, while they did not go through the proper refcount increase. But I might misread the commit, and/or miss some other checks at the upper level that would provide a shortcut for groups with no processes (but that are not MPI_GROUP_EMPTY)

@awlauria awlauria force-pushed the 8677_fix branch 2 times, most recently from f91f31d to 23bccf9 Compare April 22, 2021 21:20
@awlauria
Copy link
Contributor Author

@bosilca I agree - I restored the symmetry.

@awlauria
Copy link
Contributor Author

@bosilca @hjelmn ping - please re-review.

- Make sure the epoch type is set before returning from MPI_Win_start().
- Make sure the group is only free'd if it is valid in MPI_Win_complete().
  - Fix possible double free() of the group.

Co-authored-by: Lisandro Dalcin <dalcinl@gmail.com>

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
@awlauria awlauria force-pushed the 8677_fix branch 2 times, most recently from f91335b to ec0d6ff Compare April 29, 2021 18:59
@awlauria
Copy link
Contributor Author

bot:aws:retest

@awlauria awlauria merged commit 9f74df9 into open-mpi:master Apr 30, 2021
@awlauria awlauria deleted the 8677_fix branch April 30, 2021 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression: MPI_Win_start()/complete() with MPI_GROUP_EMPTY
3 participants