Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some multi-threading fixes #9302

Merged
merged 2 commits into from
Aug 24, 2021
Merged

Some multi-threading fixes #9302

merged 2 commits into from
Aug 24, 2021

Conversation

awlauria
Copy link
Contributor

@awlauria awlauria commented Aug 20, 2021

ompi/request: Add a read memory barrier to sync the receive buffer so…

…on after wait completes.

We found an issue where with using multiple threads, it is possible for the data
to not be in the buffer before MPI_Wait() returns. Testing the buffer later after
MPI_Wait() returned would show the data arrives eventually without the rmb().

We have seen this issue on Power9 intermittently using PAMI, but in theory could
happen with any transport.

Signed-off-by: Austen Lauria awlauria@us.ibm.com

@awlauria awlauria changed the title Thread fixes Some multi-threading fixes Aug 20, 2021
@awlauria
Copy link
Contributor Author

The second commit looks larger than it really is, it is mostly spacing adjustments.

@awlauria awlauria requested a review from nysal August 20, 2021 14:01
@open-mpi open-mpi deleted a comment from ibm-ompi Aug 20, 2021
@open-mpi open-mpi deleted a comment from ibm-ompi Aug 20, 2021
@open-mpi open-mpi deleted a comment from ibm-ompi Aug 20, 2021
/**
* Register and open all available components, giving them a chance to access the MCA parameters.
*/
OPAL_THREAD_LOCK(&ompi_mpi_topo_bootstrap_mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather have this safety moved down into the framework itself, as I don't think that any lazily initialized framework support, or need, to allow threaded initialization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on this a bit? The io framework lazy init does something similar to what's done here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mentionned we have several frameworks that need to be initialized lazily potentially in a multi-threaded way. I would like to have a consistent way to handle their lazy initialization, to avoid any discrepancies between them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #9307 for discussing the lazy framework initialization. I can work on that once we have consensus. We can remove this specific commit from the PR, so that the rest needn't be held up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nysal . Will remove the commit.

ompi/mpi/c/init_thread.c Show resolved Hide resolved
ompi/request/request.h Show resolved Hide resolved
ompi/request/request.h Show resolved Hide resolved
@gpaulsen gpaulsen assigned nysal and gpaulsen and unassigned gpaulsen Aug 24, 2021
@gpaulsen
Copy link
Member

@nysal has some ideas about how to implement @bosilca's suggestion, and some time to work on it. Assigned to him. Thanks @nysal.

…on after wait completes.

We found an issue where with using multiple threads, it is possible for the data
to not be in the buffer before MPI_Wait() returns. Testing the buffer later after
MPI_Wait() returned would show the data arrives eventually without the rmb().

We have seen this issue on Power9 intermittently using PAMI, but in theory could
happen with any transport.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
@awlauria
Copy link
Contributor Author

Removed the commit with the topo changes.

@awlauria awlauria merged commit 38c47fa into open-mpi:master Aug 24, 2021
@awlauria awlauria deleted the thread_fixes branch August 24, 2021 17:20
@awlauria awlauria mentioned this pull request Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants