Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openmpi MPI_Bcast hang with cuda gdr with RoCE #5479

Closed
LiweiPeng opened this issue Jul 25, 2018 · 1 comment
Closed

openmpi MPI_Bcast hang with cuda gdr with RoCE #5479

LiweiPeng opened this issue Jul 25, 2018 · 1 comment
Assignees

Comments

@LiweiPeng
Copy link

LiweiPeng commented Jul 25, 2018

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

3.1.1 (also in 3.0.0)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

open mpi was built using 3.1.1 tarball.
cuda: 9.2

Please describe the system on which you are running

  • Operating system/version: CentOS 7.4 x64
  • Computer hardware: nvidia GPU P100
  • Network type: RDMA RoCE V2, mellanox ConnectX4, mellanox OFED 4.3, with nv_peer_memory module.

Details of the problem

The test i ran is osu_bcast from osu-micro-benchmarks-5.4.3.tar.gz, built with cuda. The same test case works under mvpich 2.3 with cuda.

the following is the cmd line and hang stack

mpirun  --machinefile m2.txt --map-by ppr:1:node --mca btl openib,self,vader --mca btl_openib_cpc_include rdmacm --mca btl_openib_rroce_enable 1 --mca btl_openib_receive_queues P,256,256:P,131072,32:S,131072,32 --mca btl_openib_want_cuda_gdr 1  osu_bcast -d cuda -i 10

OSU MPI-CUDA Broadcast Latency Test v5.4.3
Size       Avg Latency(us)
1                       8.37
2                       8.40
4                       8.43
8                       8.65
16                      8.63
32                      8.46
64                      8.53
128                     8.52
256                     8.62
512                     8.64
1024                    8.87
2048                   19.28
4096                   29.94
8192                   52.95
16384                  94.04
32768                 178.76
##^C (hang here)

(gdb) bt
#0  0x00007f0d4f7ce1f5 in poll_device () from /opt/rdma/mpi/openmpi/lib/openmpi/mca_btl_openib.so
#1  0x00007f0d4f7cefa2 in btl_openib_component_progress ()
   from /opt/rdma/mpi/openmpi/lib/openmpi/mca_btl_openib.so
#2  0x00007f0d9041751c in opal_progress () from /opt/rdma/mpi/openmpi/lib/libopen-pal.so.40
#3  0x00007f0d4e95e4ed in mca_pml_ob1_recv () from /opt/rdma/mpi/openmpi/lib/openmpi/mca_pml_ob1.so
#4  0x00007f0d91e8e586 in ompi_coll_base_bcast_intra_split_bintree ()
   from /opt/rdma/mpi/openmpi/lib/libmpi.so.40
#5  0x00007f0d4d4f18ec in ompi_coll_tuned_bcast_intra_dec_fixed ()
   from /opt/rdma/mpi/openmpi/lib/openmpi/mca_coll_tuned.so
#6  0x00007f0d91e5aaf9 in PMPI_Bcast () from /opt/rdma/mpi/openmpi/lib/libmpi.so.40
#7  0x0000000000401e43 in main (argc=5, argv=0x7ffdb8941008) at osu_bcast.c:92

(gdb) bt
#0  mlx5_poll_one (cqe_ver=1, wc_size=48, wc=0x7fffd01007e0, cur_srq=<synthetic pointer>,
    cur_rsc=<synthetic pointer>, cq=<optimized out>) at src/cq.c:942
#1  poll_cq (cqe_ver=1, wc_size=48, wc=<optimized out>, ne=<optimized out>, ibcq=0x285eca0) at src/cq.c:1299
#2  mlx5_poll_cq_1 (ibcq=0x285eca0, ne=<optimized out>, wc=<optimized out>) at src/cq.c:1338
#3  0x00007f7c637ce203 in poll_device () from /opt/rdma/mpi/openmpi/lib/openmpi/mca_btl_openib.so
#4  0x00007f7c637cefa2 in btl_openib_component_progress ()
   from /opt/rdma/mpi/openmpi/lib/openmpi/mca_btl_openib.so
#5  0x00007f7ca444c51c in opal_progress () from /opt/rdma/mpi/openmpi/lib/libopen-pal.so.40
#6  0x00007f7ca5e7a6e5 in ompi_request_default_wait () from /opt/rdma/mpi/openmpi/lib/libmpi.so.40
#7  0x00007f7ca5ecd4ba in ompi_coll_base_barrier_intra_two_procs ()
   from /opt/rdma/mpi/openmpi/lib/libmpi.so.40
#8  0x00007f7ca5e8f7a7 in PMPI_Barrier () from /opt/rdma/mpi/openmpi/lib/libmpi.so.40
#9  0x0000000000401e75 in main (argc=5, argv=0x7fffd0103b88) at osu_bcast.c:98
@LiweiPeng
Copy link
Author

This issue was resolved by using solutions mentioned in
#3972,
#3573.

After setting the following options, it works now
--mca btl_openib_cuda_async_recv false --mca btl_openib_receive_queues P,256,256:S,128,256,192,128:S,2048,1024,1008,64:S,12288,1024,1008,64:S,131072,1024,1008,64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants