Fix signature of torch allocator callbacks #1408

wence- · 2023-12-12T15:41:00Z

Description

The deallocation function now also takes the device id.

Since both halves of the pair now receive the device on which to perform the (de)allocation, we switch from using
get_current_device_resource to using the (more correct) get_per_device_resource. This necessitates a workaround in Cython: rmm::cuda_device_id has no nullary constructor, and so cannot be stack-allocated the way Cython transpiles code. Instead perform a heap allocation and then delete it.

Closes [BUG] Unexpected memory usage on GPU0 #1405

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

The deallocation function now also takes the device id. Since both halves of the pair now receive the device on which to perform the (de)allocation, we switch from using get_current_device_resource to using the (more correct) get_per_device_resource. This necessitates a workaround in Cython: rmm::cuda_device_id has no nullary constructor, and so cannot be stack-allocated the way Cython transpiles code. Instead perform a heap allocation and then delete it. - Closes rapidsai#1405

wence- · 2023-12-12T15:41:22Z

This is a worse version of #1407 that doesn't require me to know cmake to figure out how to build a standalone shared library

shwina · 2023-12-12T16:46:41Z

python/rmm/_lib/torch_allocator.pyx

 ) except * with gil:
-    cdef device_memory_resource* mr = get_current_device_resource()
+    cdef cuda_device_id* device_id


Any reason not to use a smart pointer here?

I need this allocation to live for about one line. I guess I could use a smart pointer but it seems not particularly necessary.

Yeah it's just one less thing to worry about if we ever add/refactor code here

wence- · 2023-12-12T18:17:42Z

Closing in favour of #1407.

github-actions bot added the Python Related to RMM Python API label Dec 12, 2023

wence- mentioned this pull request Dec 12, 2023

[BUG] Unexpected memory usage on GPU0 #1405

Closed

shwina reviewed Dec 12, 2023

View reviewed changes

wence- closed this Dec 12, 2023

wence- deleted the wence/fix/1405-worse branch December 13, 2023 12:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix signature of torch allocator callbacks #1408

Fix signature of torch allocator callbacks #1408

wence- commented Dec 12, 2023

wence- commented Dec 12, 2023

shwina Dec 12, 2023

wence- Dec 12, 2023

shwina Dec 12, 2023

wence- commented Dec 12, 2023

Fix signature of torch allocator callbacks #1408

Fix signature of torch allocator callbacks #1408

Conversation

wence- commented Dec 12, 2023

Description

Checklist

wence- commented Dec 12, 2023

shwina Dec 12, 2023

Choose a reason for hiding this comment

wence- Dec 12, 2023

Choose a reason for hiding this comment

shwina Dec 12, 2023

Choose a reason for hiding this comment

wence- commented Dec 12, 2023