Add a host-pinned memory resource that can be used as upstream for `pool_memory_resource`. #1392

harrism · 2023-11-28T03:41:09Z

Description

Depends on #1417

Adds a new host_pinned_memory_resource that implements the new cuda::mr::memory_resource and cuda::mr::async_memory_resource concepts which makes it usable as an upstream MR for rmm::mr::device_memory_resource.

Also tests a pool made with this new MR as the upstream.

Note that the tests explicitly set the initial and maximum pool sizes as using the defaults does not currently work. See #1388 .

Closes #618

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

miscco

I have some minor suggestions

I am a bit dismayed about the amount of documentation boilerplate.

Maybe we could work with defaulted arguments rather than redefining the function each time?

include/rmm/mr/host_pinned_memory_resource.hpp

harrism · 2023-12-06T02:18:04Z

Maybe we could work with defaulted arguments rather than redefining the function each time?

I am able to consolidate the non-async functions into a single allocate and deallocate (eliminating two functions). But for the async versions, we have existing calls to allocate_async(bytes, stream) that will fail if we just have allocate_async(bytes, alignment=default_alignment, stream=default_stream. So I can't consolidate these yet.

Also, suppose we did consolidate these. What should we use for default_alignment? A device memory resource should use the default CUDA memory alignment (256 bytes). But a host memory resource should probably use alignof(std::max_align_t). And a user who wants to not care about alignment will have to decide what to pass for alignment because they need it in order to use an explicit stream.

Actually, this default alignment problem applies to the non-async allocate/deallocate functions too. Because when we convert something like pool_memory_resource to the cuda::memory_resource concept what should it use if we need to provide default alignments? The underlying memory for a pool could be device or host memory. I think the pool needs to be able to figure out what its default alignment should be...

Thoughts?

- Consolidate allocate/deallocate functions using default alignment argument. - Add missing includes.

miscco · 2023-12-06T08:48:08Z

Actually, this default alignment problem applies to the non-async allocate/deallocate functions too.

We could define a free helper function that return 256 on device and alignof(std::max_align_t) on host. I am not sure whether this is something that we really want

wence- · 2023-12-06T09:47:33Z

Actually, this default alignment problem applies to the non-async allocate/deallocate functions too. Because when we convert something like pool_memory_resource to the cuda::memory_resource concept what should it use if we need to provide default alignments? The underlying memory for a pool could be device or host memory. I think the pool needs to be able to figure out what its default alignment should be...

What is a minimum safe value (given we don't know the type of the pointer we're allocating for)? On host, I believe the answer is alignof(std::max_align_t), on device I think it's alignof(double4) == 16 bytes (as per https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#vector-types-alignment-requirements-in-device-code). This is smaller than the 256 byte alignment guaranteed by allocations from the cudamalloc family, so I am not sure if that is pertinent.

Should memory resources that perform concrete allocations advertise their default alignment as a property and then wrapping resources can query that?

miscco · 2023-12-06T10:09:19Z

Should memory resources that perform concrete allocations advertise their default alignment as a property and then wrapping resources can query that?

We can make this a property. The one potential design issue that our memory_resource has is that it pushes those properties into APIs. We cannot querry it for a property that we did not specify in the template arguments

harrism · 2023-12-06T21:51:06Z

We can make this a property. The one potential design issue that our memory_resource has is that it pushes those properties into APIs. We cannot querry it for a property that we did not specify in the template arguments

Can you explain a bit more? Maybe an example so I can understand what you mean?

miscco · 2023-12-07T07:38:06Z

Can you explain a bit more? Maybe an example so I can understand what you mean?

The issue is that properties are awesome if you have them around. But resource_ref is type erased, so we essentially drop all properties that are not stored in the resource_ref in the bin.

Its the difference between:

template<class T>
    requires resource_with<cuda::mr::device_accessible>
void* special_allocate(T& memory_resource, size_t size)

void* special_allocate(cuda::mr::resource_ref<cuda::mr::device_accessible>& memory_resource, size_t size)

The latter is a streamlined implementation that reduces binary size considerably and generally simplifies the interfaces a lot. However, there is no T there anymore. In the former function I can querry T with cuda::mr::has_property<T, some_property> and get a result. In the latter it is impossible because resource_ref itself does not hold that property unless we tell it so explicitly.

Its definitely a wart

harrism · 2023-12-09T21:04:59Z

Also the former requires C++20, right?

miscco · 2023-12-11T07:39:53Z

Also the former requires C++20, right?

Yes, but you can also just write

template<class T, cuda::std::enable_if_t<resource_with<cuda::mr::device_accessible>, int> = 0>
void* special_allocate(T& memory_resource, size_t size)

…e_Device_memory utility.

… that require an initial pool size.

abellina

This LGTM, the alignment parameters not being used obviously means we will need to align on our own. Our pinned code looks to be using std::max_align_t right now, just FYI.

harrism · 2024-01-16T22:17:44Z

@abellina you are right we should probably fix this to actually align. I think cudaHostAlloc leaves alignment up to the caller.

harrism · 2024-01-17T20:19:22Z

@abellina you are right we should probably fix this to actually align. I think cudaHostAlloc leaves alignment up to the caller.

Done.

harrism · 2024-01-17T20:20:09Z

/ok to test

include/rmm/mr/pinned_host_memory_resource.hpp

harrism · 2024-01-18T09:25:39Z

/ok to test

wence-

Minor documentation nits, otherwise looks great!

wence- · 2024-01-18T09:30:10Z

include/rmm/aligned.hpp

 *
- * @return Whether the input a power of two with non-negative exponent
+ * @return True if the input a power of two with non-negative exponent, false otherwise.


Suggested change

* @return True if the input a power of two with non-negative exponent, false otherwise.

* @return True if the input is a power of two with non-negative exponent, false otherwise.

Double-nit (no need to act on it) non-negative integer exponent (all integers can be expressed as powers of two if we admit real exponents).

include/rmm/detail/aligned.hpp

…docs

harrism · 2024-01-18T11:04:51Z

/ok to test

abellina · 2024-01-18T16:40:39Z

For me this still looks good. I was able to replace our pinned pool with a pool_memory_resource<pinned_host_memory_resource> and use the PTDS stream for all allocations. Perf overall is unchanged from our java-base one, except for one of the NDS queries (83) where this shows a small improvement. I would like to confirm why that is, but my guess is that it has to do with fragmentation in the old pool.

bdice

Two questions, then I'm happy to approve.

bdice · 2024-01-18T16:47:47Z

include/rmm/mr/pinned_host_memory_resource.hpp

+   * @briefreturn{true if the specified resource is the same type as this resource, otherwise
+   * false.}


This docstring implies it's possible to compare with another type of resource and get false, but the implementation doesn't allow that. Do we need to update the implementation or the docstrings?

Oh yeah, I had that thought. Is there a blanket "false" implementation in the base class somehow?

I think this is how comparison works with cuda::mr. Basically if you try to compare with another type of resource, compilation will fail. Note that refactoring to cuda::mr will necessitate changing the semantics RMM currently (mostly) has for MR equality comparison. #1402

Note also that pinned_host_memory_resource is NOT a device_memory_resource. It simply implements the cuda::mr::memory_resource and cuda::mr::async_memory_resource concepts.

(also note there is no base class)

I changed the docstring so it doesn't say that false can be returned. Note that we should probably followup with more explicit tests of this MR and future MRs like it. Right now, though, our test machinery for MRs assumes they are all device_memory_resource, so while I can pass a pool_memory_resource<pinned_host_memory_resource> to all the MR tests, I can't pass just pinned_host_memory_resource currently. (It does get tested as the upstream in the former case though, including its operator==).

Okay. If there's no base class, I've just lost track of how the class hierarchy works. I don't have any further comments here but I'll need to refresh myself on how things are supposed to work someday.

tests/mr/device/mr_test.hpp

harrism · 2024-01-18T21:21:32Z

/ok to test

bdice · 2024-01-18T21:46:10Z

@harrism asked me to merge once I approve, so I'll do that.

bdice · 2024-01-18T21:46:14Z

/merge

Add host_pinned_memory_resource and tests.

fae33fa

harrism added feature request New feature or request non-breaking Non-breaking change cpp Pertains to C++ code labels Nov 28, 2023

harrism self-assigned this Nov 28, 2023

harrism requested a review from a team as a code owner November 28, 2023 03:41

harrism requested review from bdice and jrhemstad November 28, 2023 03:41

Add missing maybe_unused alignment parameter and fix briefreturn

15be572

harrism requested a review from miscco November 28, 2023 03:51

miscco approved these changes Nov 28, 2023

View reviewed changes

Merge branch 'branch-24.02' into fea-host-pinned-mr

e8c227b

Respond to review feedback:

2b37372

- Consolidate allocate/deallocate functions using default alignment argument. - Add missing includes.

harrism mentioned this pull request Dec 6, 2023

[FEA] Change semantics of RMM memory resource equality comparison #1402

Open

harrism added 8 commits December 19, 2023 01:07

Add new util to get a fraction of available device mem, move availabl…

c43a8c1

…e_Device_memory utility.

Deprecate old pool_mr ctors (optional initial size) and add new ctors…

d238daa

… that require an initial pool size.

Update all tests and resources to use new pool ctors and util

3d65d4c

Rename fraction_of_free_device_memory to percent_of_free_device_memory

66d85b4

clang-tidy Ignore 50 and 100 magic numbers

265de9b

Remove straggler includes of removed file.

0be364b

Merge branch 'branch-24.02' into fea-explicit-initial-pool-size

266afa9

Another missed include.

5d66f40

abellina approved these changes Jan 16, 2024

View reviewed changes

harrism added 4 commits January 17, 2024 00:08

Some cleanup of aligned_allocate/deallocate

27fe52c

Implement aligned alloc/dealloc and fix tests.

da934ba

Merge branch 'branch-24.02' into fea-host-pinned-mr

7d51fea

copyright year

85286b0

harrism mentioned this pull request Jan 18, 2024

[FEA] Python bindings for rmm::mr::pinned_host_memory_resource #1429

Open

miscco approved these changes Jan 18, 2024

View reviewed changes

include/rmm/mr/pinned_host_memory_resource.hpp Show resolved Hide resolved

harrism added 2 commits January 18, 2024 09:21

static_assert MR properties.

f7b0ca5

I don't know how those deprecated calls snuck back in.

52fc2f1

wence- removed the request for review from a team January 18, 2024 09:27

wence- approved these changes Jan 18, 2024

View reviewed changes

harrism added 2 commits January 18, 2024 11:00

Rename aligned_[de]allocate to aligned_host_[de]allocate and clarify …

6162699

…docs

Fix docs per feedback

fa140ae

bdice reviewed Jan 18, 2024

View reviewed changes

abellina mentioned this pull request Jan 18, 2024

[FEA][JNI] Adopt rmm::pinned_host_memory_resource in java rapidsai/cudf#14782

Open

harrism added 2 commits January 18, 2024 21:17

Factor out mr test utilities.

aafa18a

Fix docstring for operator==

92c8e23

jrhemstad approved these changes Jan 18, 2024

View reviewed changes

bdice approved these changes Jan 18, 2024

View reviewed changes

rapids-bot bot merged commit 12f8de3 into rapidsai:branch-24.02 Jan 18, 2024
47 checks passed

alexbarghi-nv mentioned this pull request Aug 28, 2024

[IMP] Research and Benchmark cuGraph Sampling with Host Pinned Memory rapidsai/cugraph-gnn#37

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a host-pinned memory resource that can be used as upstream for `pool_memory_resource`. #1392

Add a host-pinned memory resource that can be used as upstream for `pool_memory_resource`. #1392

harrism commented Nov 28, 2023 •

edited

Loading

miscco left a comment

harrism commented Dec 6, 2023 •

edited

Loading

miscco commented Dec 6, 2023

wence- commented Dec 6, 2023

miscco commented Dec 6, 2023

harrism commented Dec 6, 2023

miscco commented Dec 7, 2023 •

edited by wence-

Loading

harrism commented Dec 9, 2023

miscco commented Dec 11, 2023

abellina left a comment

harrism commented Jan 16, 2024

harrism commented Jan 17, 2024

harrism commented Jan 17, 2024

harrism commented Jan 18, 2024

wence- left a comment

wence- Jan 18, 2024

wence- Jan 18, 2024

harrism commented Jan 18, 2024

abellina commented Jan 18, 2024

bdice left a comment

bdice Jan 18, 2024

wence- Jan 18, 2024

harrism Jan 18, 2024

harrism Jan 18, 2024

harrism Jan 18, 2024

harrism Jan 18, 2024

bdice Jan 18, 2024

harrism commented Jan 18, 2024

bdice commented Jan 18, 2024

bdice commented Jan 18, 2024

	* @return True if the input a power of two with non-negative exponent, false otherwise.
	* @return True if the input is a power of two with non-negative exponent, false otherwise.

		* @briefreturn{true if the specified resource is the same type as this resource, otherwise
		* false.}

Add a host-pinned memory resource that can be used as upstream for pool_memory_resource. #1392

Add a host-pinned memory resource that can be used as upstream for pool_memory_resource. #1392

Conversation

harrism commented Nov 28, 2023 • edited Loading

Description

Checklist

miscco left a comment

Choose a reason for hiding this comment

harrism commented Dec 6, 2023 • edited Loading

miscco commented Dec 6, 2023

wence- commented Dec 6, 2023

miscco commented Dec 6, 2023

harrism commented Dec 6, 2023

miscco commented Dec 7, 2023 • edited by wence- Loading

harrism commented Dec 9, 2023

miscco commented Dec 11, 2023

abellina left a comment

Choose a reason for hiding this comment

harrism commented Jan 16, 2024

harrism commented Jan 17, 2024

harrism commented Jan 17, 2024

harrism commented Jan 18, 2024

wence- left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harrism commented Jan 18, 2024

abellina commented Jan 18, 2024

bdice left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harrism commented Jan 18, 2024

bdice commented Jan 18, 2024

bdice commented Jan 18, 2024

Add a host-pinned memory resource that can be used as upstream for `pool_memory_resource`. #1392

Add a host-pinned memory resource that can be used as upstream for `pool_memory_resource`. #1392

harrism commented Nov 28, 2023 •

edited

Loading

harrism commented Dec 6, 2023 •

edited

Loading

miscco commented Dec 7, 2023 •

edited by wence-

Loading