300node 8GPU 4 IB NCCL TEST #1454

gim4moon · 2024-09-19T01:47:11Z

Hello

Currently, our client company is supporting nccl-test.

We are supporting it by writing the script below.

mpirun -np 300 -N 1 -x NCCL_DEBUG=INFO --hostfile /nccl/hostfile
-mca plm_rsh_no_tree_spawn 1 -mca plm_rsh_num_concurrent 512
--bind-to none -mca btl tcp,self -mca coll_hcoll_enable 0
-x NCCL_SOCKET_IFNAME=bond0
-x NCCL_IB_AR_THRESHOLD=0 -x NCCL_IB_PCI_RELAXED_ORDERING=1
-x NCCL_IB_SPLIT_DATA_ON_QPS=0 -x NCCL_IB_QPS_PER_CONNECTION=2 -x CUDA_DEVICE_ORDER=PCI_BUS_ID
-x PATH -x LD_LIBRARY_PATH=$LD_LIBRARY_PATH
-x NCCL_NET_GDR_READ=1 -x NCCL_IGNORE_CPU_AFFINITY=1 -x NCCL_DEBUG_SUBSYS=INIT,ENV,GRAPH -x NCCL_DEBUG_SUBSYS=NET
/nccl/nccl-tests/build/all_reduce_perf -b 512 -e 8G -f 2 -g 8

The max busbw is only 14GB/s

Is there something wrong with the command? Please help me.

kiskra-nvidia · 2024-09-19T05:38:00Z

We need more info. What are the GPUs? What is the interconnect? The output of nvidia-smi and nvidia-smi topo -m from one of the nodes would be nice, as would a dump of the topology detected by NCCL. Can you include the NCCL debug output (from just one of the ranks, please! 😃), especially since you collect it already? It might be worth adding TUNING to the list of subsystems to debug...

gim4moon · 2024-09-19T06:49:19Z

We need more info. What are the GPUs? What is the interconnect? The output of nvidia-smi and nvidia-smi topo -m from one of the nodes would be nice, as would a dump of the topology detected by NCCL. Can you include the NCCL debug output (from just one of the ranks, please! 😃), especially since you collect it already? It might be worth adding TUNING to the list of subsystems to debug...

The node is Dell XE9680.

The GPU is H100 x 8EA per node.

The Infiniband has connectX-7 x 4EA VPI card (mlx5_0:1, mlx5_1:1, mlx5_2:1, mlx5_3:1) per node and 200G ethernet cards x 2EA (bonding configuration).

The topology is GPU to GPU connected with NV18, and GPU to NIC connected with PIX.

I'm sorry I can't provide the original nvidia-smi and topo!

I appreciate your help as much as possible.

GeofferyGeng · 2024-09-26T07:38:36Z

The Infiniband has connectX-7 x 4EA VPI card (mlx5_0:1, mlx5_1:1, mlx5_2:1, mlx5_3:1) per node and 200G ethernet cards x 2EA (bonding configuration).

The compatibility of NCCL with NIC bonding is not very good, at least in the case of RoCE and I'm not sure if it's the same for InfiniBand.

you can test on several nodes to test bonding.

gim4moon · 2024-09-27T06:40:15Z

The Infiniband has connectX-7 x 4EA VPI card (mlx5_0:1, mlx5_1:1, mlx5_2:1, mlx5_3:1) per node and 200G ethernet cards x 2EA (bonding configuration).

The compatibility of NCCL with NIC bonding is not very good, at least in the case of RoCE and I'm not sure if it's the same for InfiniBand.

you can test on several nodes to test bonding.

It seems that some of the node issues were due to faults in the SXM GPU board and the PCI riser board.

The faulty equipment has now been replaced, and the busbw is in the early 180Gb/s range.

Is the current speed a good figure for a 4nic infrastructure with over 300 nodes?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

300node 8GPU 4 IB NCCL TEST #1454

300node 8GPU 4 IB NCCL TEST #1454

gim4moon commented Sep 19, 2024

kiskra-nvidia commented Sep 19, 2024

gim4moon commented Sep 19, 2024

GeofferyGeng commented Sep 26, 2024

gim4moon commented Sep 27, 2024

300node 8GPU 4 IB NCCL TEST #1454

300node 8GPU 4 IB NCCL TEST #1454

Comments

gim4moon commented Sep 19, 2024

kiskra-nvidia commented Sep 19, 2024

gim4moon commented Sep 19, 2024

GeofferyGeng commented Sep 26, 2024

gim4moon commented Sep 27, 2024