Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpich 3.4a3 with ch4:ofi:gni hangs at message size > 8192 on Cori #4720

Closed
brandongc opened this issue Jul 30, 2020 · 13 comments · Fixed by #5085
Closed

mpich 3.4a3 with ch4:ofi:gni hangs at message size > 8192 on Cori #4720

brandongc opened this issue Jul 30, 2020 · 13 comments · Fixed by #5085
Milestone

Comments

@brandongc
Copy link

cookbg@cori04:~/src/mpich-3.4a3> module list -l
- Package -----------------------------+- Versions -+- Last mod. ------
Currently Loaded Modulefiles:
modules/3.2.11.4                                     2019/10/23 20:26:46
gcc/8.3.0                                            2019/03/19 19:57:48
craype-haswell                                       2020/06/19 19:00:22
craype-hugepages2M                                   2020/06/19 19:00:22
craype-network-aries                                 2020/06/19 19:00:22
craype/2.6.2                                         2020/06/19 19:00:22
cray-mpich/7.7.10                                    2020/06/19 18:11:49
cray-libsci/19.06.1                                  2020/06/19 18:10:58
udreg/2.3.2-7.0.1.1_3.31__g8175d3d.ari               2020/06/04 19:51:11
ugni/6.0.14.0-7.0.1.1_7.33__ge78e5b0.ar              2020/06/11 16:50:33
pmi/5.0.14                                           2020/06/19 17:41:13
dmapp/7.1.1-7.0.1.1_4.48__g38cf134.ari               2020/06/11 17:01:45
gni-headers/5.0.12.0-7.0.1.1_6.28__g3b1              2020/06/11 16:46:01
xpmem/2.2.20-7.0.1.1_4.10__g0475745.ari              2020/06/04 19:44:21
job/2.2.4-7.0.1.1_3.36__g36b56f4.ari                 2020/06/04 19:44:17
dvs/2.12_2.2.156-7.0.1.1_8.9__g5aab709e              2020/06/11 17:14:31
alps/6.6.58-7.0.1.1_6.4__g437d88db.ari               2020/06/11 16:57:55
rca/2.2.20-7.0.1.1_4.46__g8e3fb5b.ari                2020/06/11 16:56:30
atp/2.1.3                                            2020/06/19 17:41:20
PrgEnv-gnu/6.0.5                                     2020/06/19 19:12:05
cookbg@cori04:~/src/mpich-3.4a3> ./configure --prefix=$(realpath ~/opt/mpich34) --with-device=ch4:ofi:gni --enable-ugni-static --with-pm=none --with-pmi=cray CC=gcc CXX=g++ FC=gfortran
cookbg@cori04:~/src/osu-micro-benchmarks-5.6.3> export PATH=~/opt/mpich34/bin:$PATH
cookbg@cori04:~/src/osu-micro-benchmarks-5.6.3> which mpicc
/global/homes/c/cookbg/opt/mpich34/bin/mpicc
cookbg@cori04:~/src/osu-micro-benchmarks-5.6.3> mpicc -show
gcc -I/global/u1/c/cookbg/opt/mpich34/include -L/global/u1/c/cookbg/opt/mpich34/lib -Wl,-rpath -Wl,/global/u1/c/cookbg/opt/mpich34/lib -Wl,--enable-new-dtags -lmpi
cookbg@cori04:~/src/osu-micro-benchmarks-5.6.3> ./configure CC=mpicc CXX=mpic++ CFLAGS="-march=core-avx2" CXXFLAGS="-march=core-avx2"
cookbg@cori04:~/src/osu-micro-benchmarks-5.6.3> salloc -q interactive -C haswell -N 2 -t 30
salloc: Granted job allocation 32966309
salloc: Waiting for resource configuration
salloc: Nodes nid000[58-59] are ready for job
cookbg@nid00058:~/src/osu-micro-benchmarks-5.6.3> srun -N2 -n2 -c2 --cpu-bind=cores ./mpi/pt2pt/osu_latency
# OSU MPI Latency Test v5.6.3
# Size          Latency (us)
0                       1.59
1                       1.62
2                       1.63
4                       1.63
8                       1.63
16                      1.62
32                      1.63
64                      1.63
128                     1.65
256                     1.82
512                     1.73
1024                    2.07
2048                    2.44
4096                    3.27
8192                    4.91
# THIS IS WHERE IT HANGS

Some samples with perf record -g --pid <pid> of

rank 0

# Samples: 27  of event 'cycles:ppp'
# Event count (approx.): 49577050092341
#
# Children      Self  Command      Shared Object       Symbol                          
# ........  ........  ...........  ..................  ................................
#
   100.00%     0.00%  osu_latency  libmpi.so.0.0.0     [.] __gnix_cq_sreadfrom.isra.0
            |
            ---__gnix_cq_sreadfrom.isra.0
               |          
               |--96.12%--_gnix_prog_progress
               |          |          
               |          |--4.44%--_gnix_nic_progress
               |          |          |          
               |          |           --4.44%--__nic_tx_progress
               |          |          
               |           --2.97%--GNI_CqTestEvent
               |          
                --3.88%--__pthread_rwlock_rdlock

    96.12%    88.71%  osu_latency  libmpi.so.0.0.0     [.] _gnix_prog_progress
            |          
            |--88.71%--__gnix_cq_sreadfrom.isra.0
            |          _gnix_prog_progress
            |          
             --7.41%--_gnix_prog_progress
                       |          
                       |--4.44%--_gnix_nic_progress
                       |          |          
                       |           --4.44%--__nic_tx_progress
                       |          
                        --2.97%--GNI_CqTestEvent

     4.44%     0.00%  osu_latency  libmpi.so.0.0.0     [.] _gnix_nic_progress
            |          
             --4.44%--_gnix_nic_progress
                       |          
                        --4.44%--__nic_tx_progress

     4.44%     4.44%  osu_latency  libmpi.so.0.0.0     [.] __nic_tx_progress
            |          
             --4.44%--__gnix_cq_sreadfrom.isra.0
                       _gnix_prog_progress
                       |          
                        --4.44%--_gnix_nic_progress
                                  __nic_tx_progress

     3.88%     3.88%  osu_latency  libpthread-2.26.so  [.] __pthread_rwlock_rdlock
            |
            ---__gnix_cq_sreadfrom.isra.0
               __pthread_rwlock_rdlock

     2.97%     2.97%  osu_latency  libmpi.so.0.0.0     [.] GNI_CqTestEvent
            |
            ---__gnix_cq_sreadfrom.isra.0
               _gnix_prog_progress
               GNI_CqTestEvent

     0.00%     0.00%  osu_latency  libmpi.so.0.0.0     [.] GNII_DlaProgress
     0.00%     0.00%  osu_latency  libmpi.so.0.0.0     [.] __nic_get_completed_txd
     0.00%     0.00%  osu_latency  libmpi.so.0.0.0     [.] _gnix_prog_progress@plt
     0.00%     0.00%  osu_latency  libmpi.so.0.0.0     [.] MPIDI_POSIX_eager_recv_begin
     0.00%     0.00%  osu_latency  libmpi.so.0.0.0     [.] MPID_Progress_wait
     0.00%     0.00%  osu_latency  [unknown]           [.] 0x00093f7000000001
     0.00%     0.00%  osu_latency  libmpi.so.0.0.0     [.] _gnix_vc_nic_progress
     0.00%     0.00%  osu_latency  libmpi.so.0.0.0     [.] _gnix_cm_nic_progress
     0.00%     0.00%  osu_latency  libmpi.so.0.0.0     [.] GNI_CqGetEvent.part.5
     0.00%     0.00%  osu_latency  [kernel]            [k] native_iret
     0.00%     0.00%  osu_latency  [unknown]           [k] 0xffffffff00000000

rank 1

# Samples: 74K of event 'cycles:ppp'
# Event count (approx.): 64830618685
#
# Children      Self  Command      Shared Object       Symbol                          
# ........  ........  ...........  ..................  ................................
#
    50.78%     6.44%  osu_latency  libmpi.so.0.0.0     [.] __gnix_cq_sreadfrom.isra.0
            |
            ---__gnix_cq_sreadfrom.isra.0
               |          
               |--38.01%--_gnix_prog_progress
               |          |          
               |          |--25.26%--_gnix_nic_progress
               |          |          |          
               |          |          |--16.97%--__nic_tx_progress
               |          |          |          |          
               |          |          |          |--7.03%--__nic_get_completed_txd
               |          |          |          |          
               |          |          |          |--2.44%--GNI_CqGetEvent.part.5
               |          |          |          |          
               |          |          |           --0.63%--GNI_CqGetEvent@plt
               |          |          |          
               |          |          |--2.14%--_gnix_vc_nic_progress
               |          |          |          
               |          |          |--1.24%--pthread_spin_lock@plt
               |          |          |          
               |          |          |--0.87%--pthread_spin_unlock@plt
               |          |          |          
               |          |          |--0.62%--pthread_spin_lock
               |          |          |          
               |          |           --0.51%--__nic_get_completed_txd
               |          |          
               |          |--3.19%--GNI_CqTestEvent
               |          |          
               |          |--3.11%--_gnix_cm_nic_progress
               |          |          
               |           --0.57%--__nic_tx_progress
               |          
               |--3.13%--__pthread_rwlock_unlock
               |          
                --2.16%--__pthread_rwlock_rdlock

    38.14%     5.23%  osu_latency  libmpi.so.0.0.0     [.] _gnix_prog_progress
            |          
            |--33.06%--_gnix_prog_progress
            |          |          
            |          |--25.26%--_gnix_nic_progress
            |          |          |          
            |          |          |--16.97%--__nic_tx_progress
            |          |          |          |          
            |          |          |          |--7.03%--__nic_get_completed_txd
            |          |          |          |          
            |          |          |          |--2.44%--GNI_CqGetEvent.part.5
            |          |          |          |          
            |          |          |           --0.63%--GNI_CqGetEvent@plt
            |          |          |          
            |          |          |--2.14%--_gnix_vc_nic_progress
            |          |          |          
            |          |          |--1.24%--pthread_spin_lock@plt
            |          |          |          
            |          |          |--0.87%--pthread_spin_unlock@plt
            |          |          |          
            |          |          |--0.62%--pthread_spin_lock
            |          |          |          
            |          |           --0.51%--__nic_get_completed_txd
            |          |          
            |          |--3.19%--GNI_CqTestEvent
            |          |          
            |          |--3.11%--_gnix_cm_nic_progress
            |          |          
            |           --0.57%--__nic_tx_progress
            |          
             --5.09%--__gnix_cq_sreadfrom.isra.0
                       _gnix_prog_progress

    25.63%     3.02%  osu_latency  libmpi.so.0.0.0     [.] _gnix_nic_progress
            |          
            |--22.61%--_gnix_nic_progress
            |          |          
            |          |--16.97%--__nic_tx_progress
            |          |          |          
            |          |          |--7.03%--__nic_get_completed_txd
            |          |          |          
            |          |          |--2.44%--GNI_CqGetEvent.part.5
            |          |          |          
            |          |           --0.63%--GNI_CqGetEvent@plt
            |          |          
            |          |--2.14%--_gnix_vc_nic_progress
            |          |          
            |          |--1.24%--pthread_spin_lock@plt
            |          |          
            |          |--0.87%--pthread_spin_unlock@plt
            |          |          
            |          |--0.62%--pthread_spin_lock
            |          |          
            |           --0.51%--__nic_get_completed_txd
            |          
             --3.02%--__gnix_cq_sreadfrom.isra.0
                       |          
                        --2.65%--_gnix_prog_progress
                                  _gnix_nic_progress

    17.53%     7.18%  osu_latency  libmpi.so.0.0.0     [.] __nic_tx_progress
            |          
            |--10.35%--__nic_tx_progress
            |          |          
            |          |--7.03%--__nic_get_completed_txd
            |          |          
            |          |--2.44%--GNI_CqGetEvent.part.5
            |          |          
            |           --0.63%--GNI_CqGetEvent@plt
            |          
             --7.18%--__gnix_cq_sreadfrom.isra.0
                       _gnix_prog_progress
                       |          
                       |--6.61%--_gnix_nic_progress
                       |          __nic_tx_progress
                       |          
                        --0.57%--__nic_tx_progress

    12.98%     0.00%  osu_latency  [unknown]           [.] 0000000000000000
            |
            ---0
               |          
               |--6.52%--GNI_CqGetEvent.part.5
               |          
               |--5.68%--GNI_CqGetEvent
               |          
                --0.74%--GNII_DlaProgress@plt

     8.96%     8.96%  osu_latency  libmpi.so.0.0.0     [.] GNI_CqGetEvent.part.5
            |          
            |--6.52%--0
            |          GNI_CqGetEvent.part.5
            |          
             --2.44%--__gnix_cq_sreadfrom.isra.0
                       _gnix_prog_progress
                       _gnix_nic_progress
                       __nic_tx_progress
                       GNI_CqGetEvent.part.5

     8.79%     8.78%  osu_latency  libmpi.so.0.0.0     [.] GNII_DlaProgress
            |          
             --8.76%--GNII_DlaProgress

     7.53%     7.53%  osu_latency  libmpi.so.0.0.0     [.] __nic_get_completed_txd
            |          
             --7.53%--__gnix_cq_sreadfrom.isra.0
                       _gnix_prog_progress
                       _gnix_nic_progress
                       |          
                       |--7.03%--__nic_tx_progress
                       |          __nic_get_completed_txd
                       |          
                        --0.51%--__nic_get_completed_txd

     6.90%     6.90%  osu_latency  libmpi.so.0.0.0     [.] progress_test
            |          
            |--3.93%--progress_test
            |          
             --2.97%--0x93f7000000001
                       progress_test

     5.94%     5.94%  osu_latency  libmpi.so.0.0.0     [.] GNI_CqGetEvent
            |          
             --5.68%--0
                       GNI_CqGetEvent

     5.20%     0.00%  osu_latency  [unknown]           [.] 0x00093f7000000001
            |
            ---0x93f7000000001
               |          
               |--2.97%--progress_test
               |          
                --1.75%--MPID_Progress_wait

     4.78%     4.77%  osu_latency  libmpi.so.0.0.0     [.] MPIDI_OFI_progress
            |
            ---MPIDI_OFI_progress

     3.72%     3.71%  osu_latency  libmpi.so.0.0.0     [.] MPIR_Progress_hook_exec_all
            |          
             --3.25%--MPIR_Progress_hook_exec_all

     3.54%     3.54%  osu_latency  libmpi.so.0.0.0     [.] MPIDI_POSIX_progress
            |
            ---MPIDI_POSIX_progress

     3.37%     3.36%  osu_latency  libpthread-2.26.so  [.] __pthread_rwlock_unlock
            |          
             --3.13%--__gnix_cq_sreadfrom.isra.0
                       __pthread_rwlock_unlock

     3.19%     3.19%  osu_latency  libmpi.so.0.0.0     [.] GNI_CqTestEvent
            |          
             --3.19%--__gnix_cq_sreadfrom.isra.0
                       _gnix_prog_progress
                       GNI_CqTestEvent

     3.15%     3.13%  osu_latency  libmpi.so.0.0.0     [.] _gnix_cm_nic_progress
            |          
             --3.13%--__gnix_cq_sreadfrom.isra.0
                       |          
                        --3.10%--_gnix_prog_progress
                                  _gnix_cm_nic_progress

     2.51%     2.51%  osu_latency  libmpi.so.0.0.0     [.] _gnix_vc_nic_progress
            |
            ---__gnix_cq_sreadfrom.isra.0
               _gnix_prog_progress
               |          
                --2.14%--_gnix_nic_progress
                          _gnix_vc_nic_progress

     2.21%     2.20%  osu_latency  libmpi.so.0.0.0     [.] MPIDI_POSIX_eager_recv_begin
            |
            ---MPIDI_POSIX_eager_recv_begin

     2.16%     2.16%  osu_latency  libpthread-2.26.so  [.] __pthread_rwlock_rdlock
            |
            ---__gnix_cq_sreadfrom.isra.0
               __pthread_rwlock_rdlock

     1.75%     1.75%  osu_latency  libmpi.so.0.0.0     [.] MPID_Progress_wait
            |
            ---0x93f7000000001
               MPID_Progress_wait

     1.37%     1.37%  osu_latency  libmpi.so.0.0.0     [.] pthread_spin_lock@plt
            |          
             --1.24%--__gnix_cq_sreadfrom.isra.0
                       _gnix_prog_progress
                       _gnix_nic_progress
                       pthread_spin_lock@plt

     1.19%     1.19%  osu_latency  libmpi.so.0.0.0     [.] pthread_spin_unlock@plt
            |          
             --0.86%--__gnix_cq_sreadfrom.isra.0
                       _gnix_prog_progress
                       _gnix_nic_progress
                       pthread_spin_unlock@plt

     0.98%     0.98%  osu_latency  libmpi.so.0.0.0     [.] MPIDI_SHM_progress
            |
            ---MPIDI_SHM_progress

     0.88%     0.88%  osu_latency  libmpi.so.0.0.0     [.] _gnix_queue_peek@plt
            |
            ---_gnix_queue_peek@plt

     0.85%     0.85%  osu_latency  libpthread-2.26.so  [.] pthread_spin_lock
            |          
             --0.62%--__gnix_cq_sreadfrom.isra.0
                       _gnix_prog_progress
                       _gnix_nic_progress
                       pthread_spin_lock

     0.74%     0.74%  osu_latency  libmpi.so.0.0.0     [.] GNII_DlaProgress@plt
            |
            ---0
               GNII_DlaProgress@plt

     0.63%     0.63%  osu_latency  libmpi.so.0.0.0     [.] GNI_CqGetEvent@plt
            |
            ---__gnix_cq_sreadfrom.isra.0
               _gnix_prog_progress
               _gnix_nic_progress
               __nic_tx_progress
               GNI_CqGetEvent@plt

     0.61%     0.61%  osu_latency  libmpi.so.0.0.0     [.] MPIDI_OFI_get_buffered
            |
            ---MPIDI_OFI_get_buffered

     0.46%     0.46%  osu_latency  libpthread-2.26.so  [.] pthread_spin_unlock
     0.44%     0.44%  osu_latency  libmpi.so.0.0.0     [.] _gnix_prog_progress@plt
     0.35%     0.35%  osu_latency  libmpi.so.0.0.0     [.] pthread_rwlock_unlock@plt
     0.32%     0.32%  osu_latency  libmpi.so.0.0.0     [.] _gnix_queue_peek
     0.28%     0.28%  osu_latency  libmpi.so.0.0.0     [.] pthread_rwlock_rdlock@plt
     0.22%     0.22%  osu_latency  libmpi.so.0.0.0     [.] GNI_CqTestEvent@plt
     0.19%     0.19%  osu_latency  libmpi.so.0.0.0     [.] _gnix_vc_nic_progress@plt
     0.12%     0.12%  osu_latency  libmpi.so.0.0.0     [.] gnix_cq_read
     0.06%     0.06%  osu_latency  [kernel]            [k] native_iret
     0.01%     0.01%  osu_latency  libmpi.so.0.0.0     [.] GNI_PostDataProbeById
     0.01%     0.01%  osu_latency  libmpi.so.0.0.0     [.] _gnix_dgram_poll
     0.00%     0.00%  osu_latency  libmpi.so.0.0.0     [.] _gnix_dgram_poll@plt
     0.00%     0.00%  osu_latency  libmpi.so.0.0.0     [.] GNI_PostDataProbeById@plt

separate libfabric

I also built my own libfabric 1.10.1 with

./configure --prefix=$(realpath ~/opt/libfabric) CC=gcc --enable-gni=yes --enable-verbs=no --enable-ugni-static --enable-tcp=no --enable-usnic=no --enable-rxm=no --enable-mrail=no --enable-sockets=no --enable-rstream=no --enable-perf=no --enable-rxd=no --enable-udp=no --enable-hook_debug=no

MPICH 3.4a3 built against this has the same issue. If I instead use the verbs provider and verbs compatibility on Cray I get bad performance, but all message sizes work:

cookbg@nid00141:~/src/osu-micro-benchmarks-5.6.3> srun -n2 -N2 -c2 --cpu-bind=cores ./mpi/pt2pt/osu_latency
Unidentified node: Error detected by libibgni.so.  Subsequent operation may be unreliable.  IAA did not recognize this as an MPI process
Unidentified node: Error detected by libibgni.so.  Subsequent operation may be unreliable.  IAA did not recognize this as an MPI process
# OSU MPI Latency Test v5.6.3
# Size          Latency (us)
0                      10.69
1                      10.51
2                      10.80
4                      10.53
8                      10.58
16                     10.52
32                     10.51
64                     10.63
128                    10.69
256                    13.37
512                    15.69
1024                   19.08
2048                   18.51
4096                   18.70
8192                   19.75
16384                  20.74
32768                  23.97
65536                  38.41
131072                 50.26
262144                 73.14
524288                118.50
1048576               200.57
2097152               375.34
4194304               704.58

libfabric only

The stand alone build of libfabric appears to work with expected performance.

https://github.com/ofi-cray/cray-tests/blob/master/performance/multi-node/rdm_pingpong.c

with MAX_MSG_SIZE (1<<22)

./configure --with-libfabric=$(realpath ~/opt/libfabric) --with-pmi=/opt/cray/pe/pmi/default CC=gcc
cookbg@cori04:~/src/cray-tests/performance/multi-node> srun -q interactive -C haswell -t 5 -N2 -n2 -c2 --cpu-bind=cores ./rdm_pingpong
1 threads
# Libfabric Latency Test
# Size            Latency (us)        Min Lat (us)        Max Lat (us)
1                         1.37                1.37                1.37
2                         1.35                1.35                1.35
4                         1.33                1.33                1.33
8                         1.33                1.33                1.33
16                        1.33                1.33                1.33
32                        1.33                1.33                1.33
64                        1.33                1.33                1.33
128                       1.34                1.34                1.34
256                       1.36                1.36                1.36
512                       1.39                1.39                1.39
1024                      1.61                1.61                1.61
2048                      1.91                1.91                1.91
4096                      2.46                2.46                2.46
8192                      3.70                3.70                3.70
16384                    13.98               13.98               13.98
32768                    12.95               12.95               12.95
65536                    17.87               17.87               17.87
131072                   24.78               24.78               24.78
262144                   36.77               36.77               36.77
524288                   69.20               69.20               69.20
1048576                 116.06              116.06              116.06
2097152                 225.39              225.39              225.39
4194304                 440.83              440.83              440.83
@raffenet raffenet added this to the 3.4 milestone Oct 13, 2020
@brandongc
Copy link
Author

Same behavior with 3.4b1

@hzhou
Copy link
Contributor

hzhou commented Nov 3, 2020

NOTES: I thought #4811 should've fixed this.

@brandongc
Copy link
Author

Same issue with 3.4 release

@roblatham00
Copy link
Contributor

3.4.1 on cori using libfabric-1.11.0 seems even worse:

 srun -N2 -n2 -c2 --cpu-bind=cores ./mpi/pt2pt/osu_latency
# OSU MPI Latency Test v5.6.3
# Size          Latency (us)
0                       2.02
HANG

not sure why osu_latency is not working beyond 0 bytes. 'cpi' is working for me, though, as are a few of the MPI-IO benchmarks that I tried.

@minsii
Copy link
Contributor

minsii commented Feb 23, 2021

Trying mpich/main with libfabric/master gni on Cori. I can reproduce the same issue that processes hang at 8K with osu_latency. Looking into the problem now.

@minsii
Copy link
Contributor

minsii commented Feb 24, 2021

To narrow down this bug, I let rank 0 performs only MPI_Send while rank 1 performs only MPI_Recv. I also tried a single message size at a time (e.g., run ./osu_latency 16384:16384). Then, an assertion failure occurs when size >= 16384 (see below). Note that all smaller messages were fine.

../include/ofi_atom.h:355: ofi_atomic_set32: Assertion `atomic->is_initialized' failed.

The bug happens at the MPICH AM pipeline send/recv path with ofi/gni. Roughly speaking, it seems to be a bug of ofi/gni provider code when using fi_sendv on sender and fi_recvmsg with FI_MULTI_RECV on receiver.

Below is a note showing how the situation happens:

  • mpich/sender: Since MPIDI_OFI_ENABLE_TAGGED is disabled for gni, MPI_Send internally falls back to AM send (which internally calls MPIDI_NM_am_isend). Because MPIDI_OFI_ENABLE_RMA is also disabled for gni, the send is eventually handled by AM pipeline (sends am packet+data at MPIDI_OFI_do_am_isend_pipeline via fi_sendv)

  • mpich/receiver: To handle incoming AM packets in mpich/ofi, we post a number of fi_recvmsg with FI_MULTI_RECV | FI_COMPLETION at MPIDI_OFI_mpi_init_hook. Gni internally creates a request with GNIX_FAB_RQ_MRECV type for each call to fi_recvmsg and queue the requests in a posted_queue.

  • gni/fi_cq_read on receiver: it handles incoming fi_sendv packet by using a callback func __smsg_rndzv_iov_start. The callback internally dequeues an available posted request (see above, the req is of type GNIX_FAB_RQ_MRECV) and use it to receive incoming fi_sendv data (copy happens at __gnix_rndzv_iov_req_build).

Cause of bug: __gnix_rndzv_iov_req_build seems incorrectly handles a GNIX_FAB_RQ_MRECV type request as GNIX_FAB_RQ_RECV . Thus, it tries to access an atomic field req->msg.outstanding_txds which is initialized only for GNIX_FAB_RQ_RECV type request.

Will report this bug to ofi/gni developers.

@minsii
Copy link
Contributor

minsii commented Feb 24, 2021

Btw, looks like ofi/gni supports both FI_RMA and FI_TAGGED according to provider matrix table. We should enable these capabilities in mpich/ofi. By doing so, the above bug might disappear too (hidden but not resolved)

@hzhou
Copy link
Contributor

hzhou commented Feb 24, 2021

I think this is due to we deleted GNI capability set -- #2985, In particular, this commit bb7c0f4

@minsii Could you try revert that commit and try setting MPIR_CVAR_OFI_USE_PROVIDER=gni and see if it fixes it.

@minsii
Copy link
Contributor

minsii commented Feb 24, 2021

I can try. But as I said, it only hides the bug, but not really resolves it (i.e., it will no longer trigger AM pipeline with ofi/gni).

@hzhou
Copy link
Contributor

hzhou commented Feb 24, 2021

I can try. But as I said, it only hides the bug, but not really resolves it (i.e., it will no longer trigger AM pipeline with ofi/gni).

I agree. We should keep following the real issue, but that shouldn't be an excuse for us to leave it broken.

@minsii
Copy link
Contributor

minsii commented Feb 25, 2021

The issue is not completely resolved by #5085. We still need libfabric #6593 or make changes in mpich (e.g., passing a pointer to NULL rather than NULL as msg.desc).

@minsii minsii reopened this Feb 25, 2021
@raffenet
Copy link
Contributor

Thanks @minsii! I'll cherry-pick this back to the 3.4.x branch, as well.

@raffenet
Copy link
Contributor

raffenet commented Jun 1, 2021

ofiwg/libfabric#6593 was merged into libfabric, so I think we can close this and let users know to grab a recent libfabric in order to run MPICH.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants