Hang in clock_gettime() during Bcast #3445

junjieqian · 2017-05-04T19:09:23Z

Thank you for taking the time to submit an issue!

Background information

OMPI hang during Bcast() on clock_gettime()

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

v1.10.3

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Build from distribution tarball.

Please describe the system on which you are running

Operating system/version: Ubuntu 14.04 in Docker container
Computer hardware: GPU/CPU
Network type: IB

Details of the problem

MPI hangs on clock_gettime(). It happens from time to time, and most jobs are on same machine. The hang can be hours or infinite.

The issue is simiar as #99, which seems has been solved.
The stack trace is as:

#0  0x00007fffa47f7b19 in clock_gettime ()
#1  0x00007f48297f485d in __GI___clock_gettime (clock_id=<optimized out>, tp=<optimized out>) at ../sysdeps/unix/clock_gettime.c:115
#2  0x00007f4817e61931 in opal_timer_base_get_usec_clock_gettime () from /usr/local/openmpi-1.10.3-cuda-8.0/lib/libopen-pal.so.13
#3  0x00007f4817de1689 in opal_progress () from /usr/local/openmpi-1.10.3-cuda-8.0/lib/libopen-pal.so.13
#4  0x00007f482a3153e5 in ompi_request_default_wait () from /usr/local/mpi/lib/libmpi.so.12
#5  0x00007f480df51990 in ompi_coll_tuned_bcast_intra_generic () from /usr/local/openmpi-1.10.3-cuda-8.0/lib/openmpi/mca_coll_tuned.so
#6  0x00007f480df51e67 in ompi_coll_tuned_bcast_intra_binomial () from /usr/local/openmpi-1.10.3-cuda-8.0/lib/openmpi/mca_coll_tuned.so
#7  0x00007f480df4676c in ompi_coll_tuned_bcast_intra_dec_fixed () from /usr/local/openmpi-1.10.3-cuda-8.0/lib/openmpi/mca_coll_tuned.so
#8  0x00007f482a329700 in PMPI_Bcast () from /usr/local/mpi/lib/libmpi.so.12
#9  0x0000000000bd29d6 in ...::Bcast (this=this@entry=0x17a7a00, buffer=buffer@entry=0x7f47ccd4f710, count=64, datatype=0x7f482a5928a0 <ompi_mpi_char>,
    root=0) at .../MPIWrapper.cpp:853

The text was updated successfully, but these errors were encountered:

jsquyres · 2017-05-04T19:11:37Z

Do you know if the hang itself is in clock_gettime(), or is Open MPI simply calling clock_gettime() all the time? (we use clock_gettime() as part of our internal progress engine)

Does the problem happen in v1.10.6? Or v2.1.0?

junjieqian · 2017-05-04T19:17:33Z

Hi @jsquyres , thank you very much for your attention and quick response! The hang should happen in OpenMPI, as I got more trace on other ranks, as follows.

#0  0x00007f7a6ecc390f in mca_btl_sm_component_progress () from /usr/local/openmpi-1.10.3-cuda-8.0/lib/openmpi/mca_btl_sm.so
#1  0x00007f7a774c365a in opal_progress () from /usr/local/openmpi-1.10.3-cuda-8.0/lib/libopen-pal.so.13
#2  0x00007f7a899f73e5 in ompi_request_default_wait () from /usr/local/mpi/lib/libmpi.so.12
#3  0x00007f7a6d633990 in ompi_coll_tuned_bcast_intra_generic () from /usr/local/openmpi-1.10.3-cuda-8.0/lib/openmpi/mca_coll_tuned.so
#4  0x00007f7a6d633e67 in ompi_coll_tuned_bcast_intra_binomial () from /usr/local/openmpi-1.10.3-cuda-8.0/lib/openmpi/mca_coll_tuned.so
#5  0x00007f7a6d62876c in ompi_coll_tuned_bcast_intra_dec_fixed () from /usr/local/openmpi-1.10.3-cuda-8.0/lib/openmpi/mca_coll_tuned.so
#6  0x00007f7a89a0b700 in PMPI_Bcast () from /usr/local/mpi/lib/libmpi.so.12
#7  0x0000000000bd29d6 in ...MPIWrapperMpi::Bcast (this=this@entry=0x22dfa00, buffer=buffer@entry=0x7f7a2c002190, count=64, datatype=0x7f7a89c748a0 <ompi_mpi_char>,
    root=0) at ...MPIWrapper.cpp:853

bosilca · 2017-05-04T19:20:26Z

So the hand is not in clock_gettime, but in the opal_progress that waits for the completion of the requests generated during the bcast. How many processes are involved your parallel application ? Can you check the stack of all processes to make sure they all reached the same MPI_Bcast ?

junjieqian · 2017-05-04T19:27:08Z

Hi @bosilca , there are 8 processes in total. 7 processes are stuck in clock_gettime(), but one is on mca_btl_sm_component_progress.

Can you check the stack of all processes to make sure they all reached the same MPI_Bcast ?

Do you mean that there should be a barrier before doing MPI_Bcast? Now, one rank does some work and others not (go to MPI_Bcast directly), could this be the problem? As it happens randomly.
And the stack trace shows that the rank 0 (doing extra work one) stuck at mca_btl_sm_component_progress, while others are on clock_gettime()`.

bosilca · 2017-05-04T19:37:36Z

They cannot be stuck in clock_gettime. What happens is that when you stop the process, it happens that it is in clock_gettime, but that particular function does not block. In fact, if you look on the stack trace, you can notice that you are in the opal_progress, which loops around pooling the network, and calling clock_gettime, until messages are received. Thus, I assume the culprit is that an expected message does not arrive, and thus your process seems blocked in opal_progress (and thus in clock_gettime).

There is no need to have a barrier before the bcast, but all processes on the communication where the bcast is called must call the MPI_Bcast function. I just wanted to make sure this is indeed the case. What really matters is if there is an MPI_Bcast on the stack trace, not what is the last function the processes are blocked into.

junjieqian · 2017-05-04T19:46:44Z

@bosilca , thank you for your explanation! I double checked the stack traces of the ranks, and they all called PMPI_Bcast () from /usr/local/mpi/lib/libmpi.so.12.

jsquyres · 2017-05-04T19:47:00Z

I would encourage you to upgrade your version of Open MPI to at least the latest in the v1.10 series (i.e., 1.10.6) to see if this bug was already fixed. If possible, you might want to upgrade to Open MPI v2.1.0.

ggouaillardet · 2017-05-04T23:14:05Z

@junjieqian the hang could occur when MPI_Bcast internally tries to establish a btl/tcp connection, that could be blocked by the firewall since you are using docker.
can you doublecheck there is no firewall running on your containers ?
assuming your IP interface is eth0, what if you

mpirun --mca btl_tcp_if_include eth0 --mca coll ^tuned ...

jsquyres added the question label May 4, 2017

junjieqian closed this as completed May 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hang in clock_gettime() during Bcast #3445

Hang in clock_gettime() during Bcast #3445

junjieqian commented May 4, 2017 •

edited by jsquyres

Loading

jsquyres commented May 4, 2017

junjieqian commented May 4, 2017 •

edited by jsquyres

Loading

bosilca commented May 4, 2017

junjieqian commented May 4, 2017 •

edited

Loading

bosilca commented May 4, 2017

junjieqian commented May 4, 2017

jsquyres commented May 4, 2017

ggouaillardet commented May 4, 2017

Hang in clock_gettime() during Bcast #3445

Hang in clock_gettime() during Bcast #3445

Comments

junjieqian commented May 4, 2017 • edited by jsquyres Loading

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Please describe the system on which you are running

Details of the problem

jsquyres commented May 4, 2017

junjieqian commented May 4, 2017 • edited by jsquyres Loading

bosilca commented May 4, 2017

junjieqian commented May 4, 2017 • edited Loading

bosilca commented May 4, 2017

junjieqian commented May 4, 2017

jsquyres commented May 4, 2017

ggouaillardet commented May 4, 2017

junjieqian commented May 4, 2017 •

edited by jsquyres

Loading

junjieqian commented May 4, 2017 •

edited by jsquyres

Loading

junjieqian commented May 4, 2017 •

edited

Loading