Does high-level cuQuantum API work with MPI? #112

yapolyak · 2024-01-24T18:05:08Z

yapolyak
Jan 24, 2024

Hi, I browsed the documentation and examples in the repo, but I failed to find any mention or examples of whether one could use high-level tensor network API, such as create_state() (specifically in Python) in distributed fashion (via MPI)?

Related, I wonder if there is any connection between Network object and high-level API?

Apologies if I have missed the obvious or did not dive into API ref. deep enough to understand myself! Many thanks!

Answered by DmitryLyakh

Jan 25, 2024

Yes, the high-level tensor network API fully supports distributed parallel execution via MPI on multiple/many GPUs. The way to activate distributed parallel execution is exactly the same as before (example:

cuQuantum/python/samples/cutensornet/tensornet_example_mpi_auto.py

Line 138 in 2ac5645

     cutn.distributed_reset_configuration(handle, MPI._addressof(cutn_comm), MPI._sizeof(cutn_comm))  

 

). Of course, this requires a distributed GPU platform with CUDA-aware MPI installed (https://docs.nvidia.com/cuda/cuquantum/latest/cutensornet/api/functions.html#distributed-parallelization-api) as well as other bookkeeping related to MPI initialization, etc. (like here

cuQuantum/pytho…

View full answer

DmitryLyakh · 2024-01-25T03:33:41Z

DmitryLyakh
Jan 25, 2024
Maintainer

Yes, the high-level tensor network API fully supports distributed parallel execution via MPI on multiple/many GPUs. The way to activate distributed parallel execution is exactly the same as before (example:

cuQuantum/python/samples/cutensornet/tensornet_example_mpi_auto.py

Line 138 in 2ac5645

    
           cutn.distributed_reset_configuration(handle, MPI._addressof(cutn_comm), MPI._sizeof(cutn_comm))

). Of course, this requires a distributed GPU platform with CUDA-aware MPI installed (https://docs.nvidia.com/cuda/cuquantum/latest/cutensornet/api/functions.html#distributed-parallelization-api) as well as other bookkeeping related to MPI initialization, etc. (like here

cuQuantum/python/samples/cutensornet/tensornet_example_mpi_auto.py

Line 14 in 2ac5645

comm = MPI.COMM_WORLD

). Other than that, exactly the same Python (or C++) code can run on both single GPU and multiple/many GPUs without any additional effort from the user. To benefit from the distributed parallel execution, the problem size needs to be sufficiently large.

7 replies

yapolyak Jan 25, 2024
Author

Thank you @DmitryLyakh!

I see, so it's about initialising a handler, passing it to cutn.distributed_reset_configuration() and then using this handler throughout the rest of API calls, right? Is it important to duplicate the communicator as in the above example, or is it not necessary?

I guess I will also start a new discussion about the choice of different-level APIs, as in what would be the benefit of using low-level API instead of the high-level one - the latter seems to cover all the important things including tensor immutability (and state re-use, as far as I understand).

yapolyak Jan 25, 2024
Author

Use of Network object is also largely redundant now, no? There, one can't even use network descriptor, as far as I can see...

yapolyak Jan 25, 2024
Author

Ah, one thing that seems special for using Network is custom MPI parallelisation, as you can pass device id to it as shown here https://github.com/NVIDIA/cuQuantum/blob/main/python/samples/cutensornet/fine/example2_mpi.py

DmitryLyakh Jan 25, 2024
Maintainer

Normally, you would use the cuTensorNet low-level API when you explicitly construct your custom tensor networks and just want to contract them efficiently on GPU via cuTensorNet. In this case, you may additionally choose to distribute tensor network contraction over multiple GPUs, either automatically (as I described above) or manually (by manually distributing tensor network slices) if you need even more control. Alternatively, if one does not want to do the low-level work manually (e.g., in the context of quantum circuit simulations), one can opt-in to using the cuTensorNet high-level API that only requires the specification of the quantum state shape (number of qudits and their dimensions) and the sequence of applied quantum gates. Then the cuTensorNet library will create all necessary tensor networks underneath and apply automatic distributed parallelization, if needed. In the latter case, you are correct, you will need to create the cuTensorNet handle, duplicate the MPI communicator, and call the distributed execution activation API function (mentioned above) to attach the communicator to the handle. From there on, all tensor network contractions will be executed distributively over all MPI processes associated with the provide communicator. Duplication of the MPI communicator is optional, but highly recommended (for separation of the communication contexts).

yapolyak Jan 25, 2024
Author

Understood, many thanks!

yapolyak · 2024-01-25T14:22:04Z

yapolyak
Jan 25, 2024
Author

Btw, in your current high-level API documentation (https://docs.nvidia.com/cuda/cuquantum/latest/python/api/cutensornet.html#high-level-tensor-network-api) you seem to have forgotten the expectation value-related API (such as presented here https://github.com/NVIDIA/cuQuantum/blob/main/python/samples/cutensornet/high_level/expectation_example.py).

3 replies

DmitryLyakh Jan 25, 2024
Maintainer

That seems like a problem with documentation, we should fix that. I hope the examples are self-explaining for now.

yapolyak Jan 25, 2024
Author

Yes, absolutely, just thought I'd flag in case you haven't spotted it!

DmitryLyakh Jan 25, 2024
Maintainer

Indeed, I've just checked, I see the C++ docs on the expectation value API, but the Python API reference is still missing that. Thanks for reporting. @tlubowe @mtjrider for visibility

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does high-level cuQuantum API work with MPI? #112

{{title}}

Replies: 2 comments 10 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Does high-level cuQuantum API work with MPI? #112

yapolyak Jan 24, 2024

Replies: 2 comments · 10 replies

DmitryLyakh Jan 25, 2024 Maintainer

yapolyak Jan 25, 2024 Author

yapolyak Jan 25, 2024 Author

yapolyak Jan 25, 2024 Author

DmitryLyakh Jan 25, 2024 Maintainer

yapolyak Jan 25, 2024 Author

yapolyak Jan 25, 2024 Author

DmitryLyakh Jan 25, 2024 Maintainer

yapolyak Jan 25, 2024 Author

DmitryLyakh Jan 25, 2024 Maintainer

yapolyak
Jan 24, 2024

Replies: 2 comments 10 replies

DmitryLyakh
Jan 25, 2024
Maintainer

yapolyak Jan 25, 2024
Author

yapolyak Jan 25, 2024
Author

yapolyak Jan 25, 2024
Author

DmitryLyakh Jan 25, 2024
Maintainer

yapolyak Jan 25, 2024
Author

yapolyak
Jan 25, 2024
Author

DmitryLyakh Jan 25, 2024
Maintainer

yapolyak Jan 25, 2024
Author

DmitryLyakh Jan 25, 2024
Maintainer