Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osc/rdma: two bug fixes #9358

Merged
merged 2 commits into from
Sep 14, 2021
Merged

Conversation

wzamazon
Copy link
Contributor

@wzamazon wzamazon commented Sep 7, 2021

fix two bugs in allocate_state_shared in osc/rdma/osc_rdma_component.c

Currenntly, in function allocate_state_shared,
"module->use_memory_registation" is set to false when all MPI ranks
are on same instances (local_size == global_size).

This is harmful when there is only one btl in use, in which case the selected
btl should determine whether memory registration should be used. For
example, btl/ofi uses memory registration even on same instance.

This is unnecessary when three are two btls in use, in which case
btls that uses memory registration have been excluded in function
ompi_osc_rdma_query_alternate_btls.

Therefore, this commit removes the setting of
    module->use_memory_registration
in allocate_state_shared.

Signed-off-by: Wei Zhang <wzam@amazon.com>
In function "allocate_state_shared", the peer->state_endpoint was
copied from 1st peer (a.k.a local_leader). However, state_endpoint
and state_btl_index of the 1st peer was not set, causing all peers'
state_endpoint being NULL.

This patch addresses the issue by setting 1st peer's state_endpoint
and state_btl_index from its data_endpoint and data_btl_index.

Signed-off-by: Wei Zhang <wzam@amazon.com>
@wzamazon
Copy link
Contributor Author

wzamazon commented Sep 7, 2021

The GitHub Action CI failure is because I did not add a description in the PR. After I did that, the GitHub Action CI passed.

@wzamazon
Copy link
Contributor Author

wzamazon commented Sep 8, 2021

Is there anything else needed for this PR to get merged?

@bwbarrett
Copy link
Member

I'd like @hjelmn to comment in case I was totally off in my review, even if he doesn't do a full review.

@wzamazon
Copy link
Contributor Author

wzamazon commented Sep 9, 2021

@hjelmn Do you have time to take a look?

@wzamazon wzamazon added the bug label Sep 10, 2021
@wzamazon
Copy link
Contributor Author

@hjelmn Can you please take a look at this PR?

@gpaulsen gpaulsen merged commit efe3962 into open-mpi:master Sep 14, 2021
@wzamazon
Copy link
Contributor Author

Thanks! I will open a PR to back port to v5.0.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants