You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/perftest# MLX5_SCATTER_TO_CQE=0 ./ib_write_bw -d mlx5_1 --use_cuda=1 -a -x 7 --ipv6 172.16.126.66 --report_gbits -F --use_cuda_dmabuf
Perftest doesn't supports CUDA tests with inline messages: inline size set to 0
initializing CUDA
Listing all CUDA devices in system:
CUDA device 0: PCIe address is 1B:00
CUDA device 1: PCIe address is 29:00
CUDA device 2: PCIe address is 45:00
CUDA device 3: PCIe address is 4E:00
Picking device No. 1
[pid = 13194, dev = 1] device name = [NVIDIA H100 80GB HBM3]
creating CUDA Ctx
making it the current CUDA Ctx
CUDA device integrated: 0
cuMemAlloc() of a 16777216 bytes GPU buffer
allocated GPU buffer address at 00007f4d26800000 pointer=0x7f4d26800000
using DMA-BUF for GPU buffer address at 0x7f4d26800000 aligned at 0x7f4d26800000 with aligned size 16777216
Calling ibv_reg_dmabuf_mr(offset=0, size=16777216, addr=0x7f4d26800000, fd=43) for QP #0
Couldn't allocate MR with error=93
failed to create mr
Failed to create MR
Couldn't create IB resources
destroying current CUDA Ctx
Cuda info:
/perftest# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jul_11_02:20:44_PDT_2023
Cuda compilation tools, release 12.2, V12.2.128
Build cuda_12.2.r12.2/compiler.33053471_0
Note the non-RDMA/CUDA one works fine
Any thoughts / ideas would be appreciated
The text was updated successfully, but these errors were encountered:
Hi, I'm testing RDMA via RoCEv2 connectivity and we're using
dma-buf
instead ofnv-peer-mem
and it's failing but I'm unsure of the fix/why.I setup the test on Ubuntu Kubernetes Pods (based on then Nvidia NGC image) and installed:
Then I ran the following command between them:
The output I got was:
Cuda info:
Note the non-RDMA/CUDA one works fine
Any thoughts / ideas would be appreciated
The text was updated successfully, but these errors were encountered: