-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uct_ud_ep_rx_creq error at np 1280 #544
Comments
reported by @alinask
More of them - http://e2e-gw.mellanox.com:4080/hpc/scrap/users/mtt/scratch/ucx_ompi/20160705_092657_15284_33052_clx-orion-001/Test_Run-mpich_tests_mpi_comm-ompi_ofed-1.10.3rc4-ompi_ofed.html |
adding the command line here for reproduction (in case the web page expires) : /hpc/local/benchmarks/hpcx_install_Monday/hpcx-gcc-redhat6.5/ompi-v1.10/bin/mpirun -np 896 -mca btl_openib_warn_default_gid_prefix 0 --bind-to core --tag-output --timestamp-output --display-map -mca pml ucx -x UCX_SHM_DEVICES=all -x UCX_NET_DEVICES=mlx5_2:1 -x UCX_ACC_DEVICES=all -mca coll_hcoll_enable 0 -x UCX_TLS=all --map-by node /hpc/scrap/users/mtt/scratch/ucx_ompi/20160705_092657_15284_33052_clx-orion-001/installs/5_Ux/tests/mpich_tests/mpich-mellanox.git/test/mpi/comm/comm_idup_overlap |
Reproduces on a smaller scale (np=128) : /hpc/mtr_scrap/users/mtt/scratch/ucx_ompi/20160903_025248_31719_100509_vegas27/installs/q3UC/install/bin/mpirun -np 128 -mca btl_openib_warn_default_gid_prefix 0 --bind-to core --tag-output --timestamp-output --display-map -mca pml ucx -x UCX_SHM_DEVICES=all -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_ACC_DEVICES=all -mca coll_hcoll_enable 0 -x UCX_TLS=rc_x,mm --map-by node /hpc/mtr_scrap/users/mtt/scratch/ucx_ompi/20160903_025248_31719_100509_vegas27/installs/q3UC/tests/mpich_tests/mpich-mellanox.git/test/mpi/coll/uoplong
|
@brminich you have a fix for this right? |
Yes, creating a test case now |
Set connection endpoint flag to no loopback
commit 5e4eb9e
Hey guys, at np 1280 I'm seeing the following error, is this a known issue?
I was testing IMB-MPI1 benchmarks using pml_ucx from ompi-trunk.
The text was updated successfully, but these errors were encountered: