-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IMB-EXT stalls using openmpi 2.1.3 #4976
Comments
iirc that is a know issue as a workaround, you can
|
I'm redeploying the servers right now. I'll test this ASAP.
As both HW are Infiniband, shouldn't both use rdma automatically ? |
Also:
|
You can
to see which component is selected. |
Here's what I get
The seems to be one:
|
I will double check that What happens when you blacklist the |
Is this the only log you get when the benchmark hangs ? |
Doing this gets it working:
|
No warning/error. Just the last printf hanging there |
Can you collect the same traces with |
You're right, the osc/rdma is not available in 1.10.7 (at least in our build) Using openmpi 1.10.7
[Cut here as it goes on and on] |
Any chance to test the latest master ? @hjelmn any recollection of this issue ? |
I have an openmpi 3.0.0 package available that I can test quickly if that's of any interest. |
That will be enough for now, thanks |
@ggouaillardet openmpi 3.0.0 behaves exactly like 2.1.3 and stalls |
it seems this has never been fixed, even on master. can you please give the inline patch a try ? diff --git a/ompi/mca/osc/rdma/osc_rdma_component.c b/ompi/mca/osc/rdma/osc_rdma_component.c
index b5c544a..db450ca 100644
--- a/ompi/mca/osc/rdma/osc_rdma_component.c
+++ b/ompi/mca/osc/rdma/osc_rdma_component.c
@@ -767,6 +767,7 @@ static int ompi_osc_rdma_query_btls (ompi_communicator_t *comm, struct mca_btl_b
int *btl_counts = NULL;
char **btls_to_use;
void *tmp;
+ int tmps[3];
btls_to_use = opal_argv_split (ompi_osc_rdma_btl_names, ',');
if (btls_to_use) {
@@ -793,6 +794,20 @@ static int ompi_osc_rdma_query_btls (ompi_communicator_t *comm, struct mca_btl_b
*btl = selected_btl;
}
+ tmps[0] = (NULL==selected_btl)?0:1;
+ rc = comm->c_coll->coll_allreduce(tmps, tmps+1, 1, MPI_INT, MPI_MAX, comm, comm->c_coll->coll_allreduce_module);
+ if (OMPI_SUCCESS != rc) {
+ return rc;
+ }
+ tmps[2] = (tmps[0] == tmps[1]) ? 1 : 0;
+ rc = comm->c_coll->coll_allreduce(tmps+2, tmps, 1, MPI_INT, MPI_MIN, comm, comm->c_coll->coll_allreduce_module);
+ if (OMPI_SUCCESS != rc) {
+ return rc;
+ }
+ if (!tmps[0]) {
+ return OMPI_ERR_NOT_AVAILABLE;
+ }
+
if (NULL != selected_btl) {
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_INFO, "selected btl: %s",
selected_btl->btl_component->btl_version.mca_component_name); |
@ggouaillardet Not a configuration I have or care about. If your patch fixes it let me know. BTW, you can get the same result using a single allreduce: tmps[0] = (NULL==selected_btl)?0:1; tmps[1] = -tmps[0];
rc = comm->c_coll->coll_allreduce(MPI_IN_PLACE, tmps, 2, MPI_INT, MPI_MAX, comm, comm->c_coll->coll_allreduce_module);
if (tmps[0] != -tmps[1]) {
/* results differ */
return OMPI_ERR_NOT_AVAILABLE;
} |
Though I do find it odd that ConnectIB doesn't select the verbs btl. Will not be an issue when the uct btl is in place. For reference see #4919. Will probably go in later this week once I have verified it works with IB. |
@hjelmn thanks for the comment, I will definitely use a single allreduce. |
@ggouaillardet Had to fix a compile error in your patch (s/com->c_coll->/com->c_coll./g) but if fixes the issue |
Here is a more correct patch [EDIT] use diff --git a/ompi/mca/osc/rdma/osc_rdma_component.c b/ompi/mca/osc/rdma/osc_rdma_component.c
index b145395..069c9dc 100644
--- a/ompi/mca/osc/rdma/osc_rdma_component.c
+++ b/ompi/mca/osc/rdma/osc_rdma_component.c
@@ -372,6 +372,8 @@ static int ompi_osc_rdma_component_query (struct ompi_win_t *win, void **base, s
int flavor)
{
+ int rc;
+
if (MPI_WIN_FLAVOR_SHARED == flavor) {
return -1;
}
@@ -385,15 +387,18 @@ static int ompi_osc_rdma_component_query (struct ompi_win_t *win, void **base, s
}
#endif /* OPAL_CUDA_SUPPORT */
- if (OMPI_SUCCESS == ompi_osc_rdma_query_mtls ()) {
+ rc = ompi_osc_rdma_query_mtls ();
+ rc = comm->c_coll->coll_allreduce(MPI_IN_PLACE, &rc, 1, MPI_INT, MPI_MIN, comm, comm->c_coll->coll_allreduce_module);
+ if (OMPI_SUCCESS == rc) {
return 5; /* this has to be lower that osc pt2pt default priority */
}
- if (OMPI_SUCCESS != ompi_osc_rdma_query_btls (comm, NULL)) {
+ rc = ompi_osc_rdma_query_btls (comm, NULL);
+ rc = comm->c_coll->coll_allreduce(MPI_IN_PLACE, &rc, 1, MPI_INT, MPI_MIN, comm, comm->c_coll->coll_allreduce_module);
+ if (OMPI_SUCCESS != rc) {
return -1;
}
-
return mca_osc_rdma_component.priority;
} similar porting has to be done for the I will resume my work next week |
Keep in mind that the patch will hurt performance for RMA. If the two systems can talk over infiniband and you want performance you need to figure out why one of the systems is not getting a valid openib btl module. |
I will look into that. |
It shouldn't. In the common case the same BTL will be selected by all processes and we should get OMPI_SUCCESS in rc. I can double-check once we finish service time on our systems. |
@nmorey @hjelmn @ggouaillardet There's been no new updates on here for months. Is this issue still happening at the HEAD of master / release branches? |
It looks like this issue is expecting a response, but hasn't gotten one yet. If there are no responses in the next 2 weeks, we'll assume that the issue has been abandoned and will close it. |
Per the above comment, it has been a month with no reply on this issue. It looks like this issue has been abandoned. I'm going to close this issue. If I'm wrong and this issue is not abandoned, please feel free to re-open it. Thank you! |
Running IMB-EXT from Intel (R) MPI Benchmarks 2018 Update 1 on a SLE12-SP3
Running with this command stalls
And then it stalls like this. Both nodes have an IMB-EXT process running at 100%
On first host:
On the second host
The text was updated successfully, but these errors were encountered: