NVIDIA GPU Support

Overview

CUDA environment support in HPC-X enables the use of NVIDIA’s GPU memory in UCX and HCOLL communication libraries for point-to-point and collective routines, respectively.

Supported Architectures

CPU architecture: x86 , power pc

NVIDIA GPU architectures: Tesla, Kepler, ascal, Volta

System Requirements

CUDA v8.0 or higher - for information on how to install CUDA, Please refer to NVIDIA documents for CUDA Toolkit
Mellanox OFED GPUDirect RDMA plugin module - for information on how to install:
- Mellanox OFED - Please refer to MLNX_OFED
- GPUDirect RDMA - Please refer to Mellanox OFED GPUDirect RDMA

Once the NVIDIA software components are installed, it is important to verify that the GPUDirect RDMA kernel module is properly loaded on each of the compute systems where you plan to run the job that requires the GPUDirect RDMA.

To check whether the GPUDirect RDMA module is loaded, run:

service nv_peer_mem status

To run this verification on other Linux flavors

lsmod | grep nv_peer_mem

GDR COPY plugin module

GDR COPY is a fast copy library from NVIDIA, used to transfer between HOST and GPU. For information on how to install GDR COPY, Please refer to its GitHub webpage. Once GDR COPY is installed, it is important to verify that the gdrcopy kernel module is properly loaded on each of the compute systems where you plan to run the job that requires the GDR COPY.

To check whether the GDR COPY module is loaded, run:

lsmod | grep gdrdrv

Configuring CUDA Support

Configuration flags: --with-cuda=<cuda/runtime/install/path> --with-gdrcopy=<gdr_copy/install/path>

List of CUDA TLS:

cuda_copy : cuda copy transport for staging protocols
gdr_copy : gdr copy for faster copy from HOST to GPU for small buffers
cuda_ipc : Intra-node Peer-to-Peer (P2P) device transfers

UCX - best communication library ever

Home
Bi-Weekly conference call
Guidance for contributors
Code style checking
UCF Collectives WG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA GPU Support

Overview

Supported Architectures

System Requirements

Configuring CUDA Support

Clone this wiki locally