-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mtt] Timeout in mpi_test_suite with HW TM #1926
Comments
Reproduced with Output:
|
but this is not same symptom as the original issue, plus the original issue used older UCX which didn't have TM in dc_x |
|
right, missed the 0x68 |
Yes, because we had TM enabled for DC verbs for a while |
Configuration:
MTT: http://e2e-gw.mellanox.com:4080//hpc/scrap/users/mtt/scratch/hcol/20171016_215200_22084_17122_clx-orion-071/html/test_stdout_zAiLea.txt
Cmd:
mpirun -np 1008 --debug-daemons --display-map --bind-to core --map-by node -mca pml ucx -mca btl_openib_warn_default_gid_prefix 0 -mca btl_openib_if_include mlx5_0:1 --timestamp-output -x HCOLL_IB_IF_INCLUDE=mlx5_0:1 -x MXM_RDMA_PORTS=mlx5_0:1 -x HCOLL_ENABLE_MCAST_ALL=0 -x HCOLL_MCAST_NP=5 -x HCOLL_CONTEXT_CACHE_ENABLE=0 -x UCX_SHM_DEVICES=all -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_ACC_DEVICES=all -x HCOLL_ENABLE_SHARP=0 -x HCOLL_ENABLE_TOPOLOGY=0 -x HCOLL_BCOL_P2P_MCAST_ALLREDUCE_ALG=1 /hpc/scrap/users/mtt/scratch/hcol/20171016_215200_22084_17122_clx-orion-071/installs/gzmC/tests/mpi-test-suite/ompi-tests/mpi_test_suite/mpi_test_suite -x relaxed -t 'Alltoall' -d 'MPI_CONTIGUOUS_INT' -n 300
Output:
MXM works:
Also reproduced without HCOLL:
The text was updated successfully, but these errors were encountered: