-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transport retry count exceeded in many-to-one tests #1920
Comments
Cmd: Output:
|
Cmd: Output:
|
I think I'm getting the same with RC, on 120 orion nodes. Commandline: Output:
|
@alex--m seems like your case was environment or hcoll related but not related with original issue. I can't reproduce it today but it was stable during last 3 days. Could you confirm if it works now for you as well? |
@amaslenn |
Don't see these anymore. |
let's wait for @alex--m 's response before closing. |
Checked yesterday, looks good now. Let's close it. |
Configuration:
MTT: http://e2e-gw.mellanox.com:4080//mnt/lustre/users/mtt/scratch/shmem/20171017_051617_17042_17161_clx-hercules-054/html/test_stdout_l1nyGu.txt
All devices are up:
Cmd:
env OMPI_MCA_btl_openib_warn_default_gid_prefix=0 OMPI_MCA_sshmem_verbs_hca_name=mlx5_0:1 OMPI_MCA_btl_openib_if_include=mlx5_0:1 MXM_RDMA_PORTS=mlx5_0:1 UCX_NET_DEVICES=mlx5_0:1 OMPI_MCA_osc=ucx OMPI_MCA_sshmem=mmap OMPI_MCA_spml_ucx_heap_reg_nb=0 'OMPI_MCA_coll=^hcoll' OMPI_MCA_coll_hcoll_enable=0 OMPI_MCA_spml=ucx OMPI_MCA_pml=ucx UCX_TLS=dc_x SHMEM_SYMMETRIC_HEAP_SIZE=128M srun --cpu_bind=core -m block --mpi=pmi2 -n 25 --nodes=25 -p hercules /mnt/lustre/users/mtt/scratch/shmem/20171017_051617_17042_17161_clx-hercules-054/installs/h5Tu/tests/verifier/tests-mellanox.git/verifier/install/bin/oshmem_test exec --no-colour --task=analysis:tc2 --task=analysis:tc3 --task=analysis:tc4 --task=analysis:tc5 --duration 10
Output:
The text was updated successfully, but these errors were encountered: