Open MPI 2.1.0: MPI_Finalize hangs because cuIpcCloseMemHandle fails #3244

Evgueni-Petrov-aka-espetrov · 2017-03-28T08:54:27Z

Hi Open MPI,

Thank you very much for fixing #3042!

We want to switch from version 2.0.2 to 2.1.0 containing the fix but, if we do, our application starts hanging in MPI_Finalize.
From our point of view, this behavior is a regression in version 2.1.0 w.r.t version 2.0.2.

First, MPI_Finalize warns that cuIpcCloseMemHandle failed with the return value of 4 (CUDA_DEINITIALIZED), and then it prints the following messages in a loop:

[hostname:87484] Sleep on 87484
[hostname:87483] Sleep on 87483
[hostname:87478] 1 more process has sent help message help-mpi-common-cuda.txt / cuIpcCloseMemHandle failed
[hostname:87478] 1 more process has sent help message help-mpi-common-cuda.txt / cuIpcCloseMemHandle failed

...
Gdb shows the following stack:

#0  0x00007f12df23393d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f12df2337d4 in __sleep (seconds=0)
    at ../sysdeps/unix/sysv/linux/sleep.c:137
#2  0x00007f12d59f63bd in cuda_closememhandle ()
   from /home/espetrov/sandbox/install_mpi/lib/libmca_common_cuda.so.20
#3  0x00007f12d55e93c9 in mca_rcache_rgpusm_finalize ()
   from /home/espetrov/sandbox/install_mpi/lib/openmpi/mca_rcache_rgpusm.so
#4  0x00007f12deca5b92 in mca_rcache_base_module_destroy ()
   from /home/espetrov/sandbox/install_mpi/lib/libopen-pal.so.20
#5  0x00007f12d435e57a in mca_btl_smcuda_del_procs ()
   from /home/espetrov/sandbox/install_mpi/lib/openmpi/mca_btl_smcuda.so
#6  0x00007f12d51e1042 in mca_bml_r2_del_procs ()
   from /home/espetrov/sandbox/install_mpi/lib/openmpi/mca_bml_r2.so
#7  0x00007f12dfaa2918 in ompi_mpi_finalize ()
   from /home/espetrov/sandbox/install_mpi/lib/libmpi.so.20

I am not sure but I would say that MPI_Finalize tries to close a remote memory handle after the remote MPI process unloaded libcuda.so.

Probably, getting CUDA_DEINITIALIZED from cuIpcCloseMemHandle is OK?
Our CUDA version is 7.5, CUDA driver version is 361.93.02.

Evgueni.

The text was updated successfully, but these errors were encountered:

jsquyres · 2017-03-29T14:23:59Z

@sjeaugey This appears to be stuck in a CUDA call. Can you look into this?

sjeaugey · 2017-03-29T16:34:56Z

Indeed, looks like a problem on our part. I'll look into this.

sjeaugey · 2017-03-29T16:46:47Z

Maybe it's not even stuck. For some reason, there is a sleep(20) (!) for every IpcCloseMemHandle that fails. So if we're emptying our cache, there may be a lot of handles to close, and it can take a really long time !

I noticed yesterday my MTT was not finished in the morning, so I had to kill it but didn't have time to look into it. Same today. Maybe that's the reason.

I'm currently testing a fix ignoring CUDA_DEINITIALIZED return codes and most importantly removing the sleep(20).

sjeaugey · 2017-03-29T18:20:45Z

I couldn't manage to reproduce the bug so far.

I tested that patch :

diff --git a/opal/mca/common/cuda/common_cuda.c b/opal/mca/common/cuda/common_cuda.c
index 2ce3b20..d66f00b 100644
--- a/opal/mca/common/cuda/common_cuda.c
+++ b/opal/mca/common/cuda/common_cuda.c
@@ -1157,10 +1157,10 @@ int cuda_closememhandle(void *reg_data, mca_rcache_base_registration_t *reg)
     if (ctx_ok) {
         result = cuFunc.cuIpcCloseMemHandle((CUdeviceptr)cuda_reg->base.alloc_base);
         if (OPAL_UNLIKELY(CUDA_SUCCESS != result)) {
-            opal_show_help("help-mpi-common-cuda.txt", "cuIpcCloseMemHandle failed",
-                           true, result, cuda_reg->base.alloc_base);
-            opal_output(0, "Sleep on %d", getpid());
-            sleep(20);
+            if (CUDA_ERROR_DEINITIALIZED != result) {
+                opal_show_help("help-mpi-common-cuda.txt", "cuIpcCloseMemHandle failed",
+                true, result, cuda_reg->base.alloc_base);
+            }
             /* We will just continue on and hope things continue to work. */
         } else {
             opal_output_verbose(10, mca_common_cuda_output,

which compiles and works but since I can't reproduce the bug, I can't confirm it fixes the problem for sure.

@Evgueni-Petrov-aka-espetrov can you give it a try ?

Evgueni-Petrov-aka-espetrov · 2017-03-31T04:53:11Z

Thanks for the fix, @sjeaugey!
It works for our application.

sjeaugey · 2017-03-31T16:47:39Z

Thanks for sharing the result. @jsquyres is it OK for me to push that patch to master directly (since Evgueni confirmed it fixed the issue ?)

rhc54 · 2017-03-31T16:51:15Z

Why not just put it in a branch and submit a PR like normal? It would allow the CI to ensure nothing broke outside of this environment.

sjeaugey · 2017-03-31T17:10:18Z

Sure -- just takes more time. I'll submit a PR.

renganxu · 2018-04-24T16:31:28Z

@sjeaugey I still have this problem when running Horovod benchmark (a MPI framework for TensorFlow). I tried both OpenMPI 2.1.1 and the latest 3.0.1, and the problem is still there. My CUDA version is 9.0.176 and the GPU driver is 387.26.

The following are the last few lines of my output for OpenMPI 3.0.1:

The call to cuIpcCloseMemHandle failed. This is a warning and the program
will continue to run.
  cuIpcCloseMemHandle return value:   4
  address: 0x2ab6b8a00000
Check the cuda.h file for what the return value means. Perhaps a reboot
of the node will clear the problem.
--------------------------------------------------------------------------
[node023:07102] Sleep on 7102
[node023:07100] Sleep on 7100
[node023:07101] Sleep on 7101

sjeaugey · 2018-04-24T23:25:52Z

@hfutxrg This is expected, since the patch above has not been merged in 3.0.x, only in 3.1.x.
So I would suggest you try 3.1 and see if you can still reproduce the issue.
https://www.open-mpi.org/software/ompi/v3.1/

Thanks !

jsquyres assigned sjeaugey Mar 29, 2017

jsquyres added the bug label Mar 29, 2017

hppritcha added this to the v2.1.1 milestone Mar 30, 2017

sjeaugey mentioned this issue Mar 31, 2017

common/cuda: Fix near-hang when remote side has exited #3265

Merged

hppritcha modified the milestones: v2.1.2, v2.1.1 Apr 24, 2017

sjeaugey closed this as completed in #3265 Jul 31, 2017

mhoemmen mentioned this issue Aug 28, 2017

Tpetra::MultiVector: Don't use CudaUVMSpace for internal comm buffers trilinos/Trilinos#1088

Closed

mhoemmen mentioned this issue Sep 5, 2017

Kokkos: Possible issue with kokkos-openmpi/2.0.1/cuda module kokkos/kokkos#1079

Closed

mhoemmen mentioned this issue May 8, 2018

MPI_Finalize slow with CUDA + OpenMPI 2.x (known OpenMPI issue; fixed in 3.1) trilinos/Trilinos#2698

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open MPI 2.1.0: MPI_Finalize hangs because cuIpcCloseMemHandle fails #3244

Open MPI 2.1.0: MPI_Finalize hangs because cuIpcCloseMemHandle fails #3244

Evgueni-Petrov-aka-espetrov commented Mar 28, 2017 •

edited

Loading

jsquyres commented Mar 29, 2017

sjeaugey commented Mar 29, 2017

sjeaugey commented Mar 29, 2017

sjeaugey commented Mar 29, 2017

Evgueni-Petrov-aka-espetrov commented Mar 31, 2017 •

edited

Loading

sjeaugey commented Mar 31, 2017

rhc54 commented Mar 31, 2017

sjeaugey commented Mar 31, 2017

renganxu commented Apr 24, 2018 •

edited by jsquyres

Loading

sjeaugey commented Apr 24, 2018

Open MPI 2.1.0: MPI_Finalize hangs because cuIpcCloseMemHandle fails #3244

Open MPI 2.1.0: MPI_Finalize hangs because cuIpcCloseMemHandle fails #3244

Comments

Evgueni-Petrov-aka-espetrov commented Mar 28, 2017 • edited Loading

jsquyres commented Mar 29, 2017

sjeaugey commented Mar 29, 2017

sjeaugey commented Mar 29, 2017

sjeaugey commented Mar 29, 2017

Evgueni-Petrov-aka-espetrov commented Mar 31, 2017 • edited Loading

sjeaugey commented Mar 31, 2017

rhc54 commented Mar 31, 2017

sjeaugey commented Mar 31, 2017

renganxu commented Apr 24, 2018 • edited by jsquyres Loading

sjeaugey commented Apr 24, 2018

Evgueni-Petrov-aka-espetrov commented Mar 28, 2017 •

edited

Loading

Evgueni-Petrov-aka-espetrov commented Mar 31, 2017 •

edited

Loading

renganxu commented Apr 24, 2018 •

edited by jsquyres

Loading