Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a host-pinned memory resource that can be used as upstream for pool_memory_resource. #1392

Merged
merged 50 commits into from
Jan 18, 2024
Merged
Show file tree
Hide file tree
Changes from 46 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
fae33fa
Add host_pinned_memory_resource and tests.
harrism Nov 28, 2023
15be572
Add missing maybe_unused alignment parameter and fix briefreturn
harrism Nov 28, 2023
e8c227b
Merge branch 'branch-24.02' into fea-host-pinned-mr
harrism Dec 5, 2023
2b37372
Respond to review feedback:
harrism Dec 6, 2023
c43a8c1
Add new util to get a fraction of available device mem, move availabl…
harrism Dec 19, 2023
d238daa
Deprecate old pool_mr ctors (optional initial size) and add new ctors…
harrism Dec 19, 2023
3d65d4c
Update all tests and resources to use new pool ctors and util
harrism Dec 19, 2023
66d85b4
Rename fraction_of_free_device_memory to percent_of_free_device_memory
harrism Dec 20, 2023
265de9b
clang-tidy Ignore 50 and 100 magic numbers
harrism Dec 20, 2023
0be364b
Remove straggler includes of removed file.
harrism Dec 20, 2023
266afa9
Merge branch 'branch-24.02' into fea-explicit-initial-pool-size
harrism Dec 20, 2023
5d66f40
Another missed include.
harrism Dec 20, 2023
fae5b73
Add detail::available_device_memory back as an alias of rmm::availabl…
harrism Jan 9, 2024
92c0653
merge branch 24.02
harrism Jan 9, 2024
2acf759
copyright
harrism Jan 9, 2024
a70b24e
Merge branch 'fea-explicit-initial-pool-size' into fea-host-pinned-mr
harrism Jan 9, 2024
b6edcd1
Rename file to match class and remove default alignment from some all…
harrism Jan 9, 2024
782ff55
document (and deprecate) available_device_memory alias
harrism Jan 9, 2024
4ef844a
Merge branch 'fea-explicit-initial-pool-size' into fea-host-pinned-mr
harrism Jan 9, 2024
ce58ff5
Add documentation for alignment params
harrism Jan 9, 2024
0b4c968
Respond to feedback from @wence-
harrism Jan 9, 2024
2f827a5
Merge branch 'fea-explicit-initial-pool-size' into fea-host-pinned-mr
harrism Jan 9, 2024
4f91478
Include doxygen deprecated output in docs
wence- Jan 9, 2024
f581809
Minor docstring fixes
wence- Jan 9, 2024
bafd70a
Don't use zero for default size in test.
harrism Jan 10, 2024
a77d215
Add non-detail alignment utilities
harrism Jan 10, 2024
07dffa3
Duplicate (for now) alignment utilities in rmm:: namespace since outs…
harrism Jan 10, 2024
8afff2d
Don't deprecate anything just yet (until cuDF/cuGraph updated)
harrism Jan 10, 2024
0140bd4
Merge branch 'fea-explicit-initial-pool-size' of github.com:harrism/r…
harrism Jan 10, 2024
91752c8
Make percent_of_free_device_memory do what it says on the tin.
harrism Jan 10, 2024
baf429c
Fix remaining uses of pool ctor in docs and code
harrism Jan 10, 2024
c90e81c
Fix overflow in percent_of_free_device_memory
harrism Jan 10, 2024
c2843be
Fix Cython to provide explicit initial size
harrism Jan 10, 2024
6e0aeaa
Respond to review suggestions in aligned.hpp
harrism Jan 10, 2024
c3c61e1
Fix quoted auto includes
harrism Jan 10, 2024
014ac5b
missed file for detail changes
harrism Jan 10, 2024
909b733
Add utilities doxygen group
harrism Jan 11, 2024
0fc3fba
Add utilities to sphinx docs
harrism Jan 11, 2024
9a876b5
Merge branch 'fea-explicit-initial-pool-size' into fea-host-pinned-mr
harrism Jan 11, 2024
b819738
Merge branch 'branch-24.02' into fea-host-pinned-mr
harrism Jan 15, 2024
27fe52c
Some cleanup of aligned_allocate/deallocate
harrism Jan 17, 2024
da934ba
Implement aligned alloc/dealloc and fix tests.
harrism Jan 17, 2024
7d51fea
Merge branch 'branch-24.02' into fea-host-pinned-mr
harrism Jan 17, 2024
85286b0
copyright year
harrism Jan 17, 2024
f7b0ca5
static_assert MR properties.
harrism Jan 18, 2024
52fc2f1
I don't know how those deprecated calls snuck back in.
harrism Jan 18, 2024
6162699
Rename aligned_[de]allocate to aligned_host_[de]allocate and clarify …
harrism Jan 18, 2024
fa140ae
Fix docs per feedback
harrism Jan 18, 2024
aafa18a
Factor out mr test utilities.
harrism Jan 18, 2024
92c8e23
Fix docstring for operator==
harrism Jan 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions include/rmm/aligned.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,9 @@ static constexpr std::size_t CUDA_ALLOCATION_ALIGNMENT{256};
/**
* @brief Returns whether or not `value` is a power of 2.
*
* @param[in] value to check.
* @param[in] value value to check.
*
* @return Whether the input a power of two with non-negative exponent
* @return True if the input a power of two with non-negative exponent, false otherwise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return True if the input a power of two with non-negative exponent, false otherwise.
* @return True if the input is a power of two with non-negative exponent, false otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-nit (no need to act on it) non-negative integer exponent (all integers can be expressed as powers of two if we admit real exponents).

*/
[[nodiscard]] constexpr bool is_pow2(std::size_t value) noexcept
{
Expand All @@ -57,7 +57,7 @@ static constexpr std::size_t CUDA_ALLOCATION_ALIGNMENT{256};
*
* @param[in] alignment to check
*
* @return Whether the alignment is valid
* @return True if the alignment is valid, false otherwise.
*/
[[nodiscard]] constexpr bool is_supported_alignment(std::size_t alignment) noexcept
{
Expand All @@ -70,7 +70,7 @@ static constexpr std::size_t CUDA_ALLOCATION_ALIGNMENT{256};
* @param[in] value value to align
* @param[in] alignment amount, in bytes, must be a power of 2
*
* @return Return the aligned value, as one would expect
* @return the aligned value
*/
[[nodiscard]] constexpr std::size_t align_up(std::size_t value, std::size_t alignment) noexcept
{
Expand All @@ -84,7 +84,7 @@ static constexpr std::size_t CUDA_ALLOCATION_ALIGNMENT{256};
* @param[in] value value to align
* @param[in] alignment amount, in bytes, must be a power of 2
*
* @return Return the aligned value, as one would expect
* @return the aligned value
*/
[[nodiscard]] constexpr std::size_t align_down(std::size_t value, std::size_t alignment) noexcept
{
Expand Down
10 changes: 7 additions & 3 deletions include/rmm/detail/aligned.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ namespace rmm::detail {
* from `alloc`.
*
* If `alignment` is not a power of 2, behavior is undefined.
* If `Alloc` does not allocate host-accessible memory, behavior is undefined.
wence- marked this conversation as resolved.
Show resolved Hide resolved
*
* @param bytes The desired size of the allocation
* @param alignment Desired alignment of allocation
Expand All @@ -137,7 +138,7 @@ namespace rmm::detail {
template <typename Alloc>
void* aligned_allocate(std::size_t bytes, std::size_t alignment, Alloc alloc)
{
assert(rmm::is_pow2(alignment));
assert(rmm::is_supported_alignment(alignment));

// allocate memory for bytes, plus potential alignment correction,
// plus store of the correction offset
Expand Down Expand Up @@ -179,9 +180,12 @@ void* aligned_allocate(std::size_t bytes, std::size_t alignment, Alloc alloc)
*/
template <typename Dealloc>
// NOLINTNEXTLINE(bugprone-easily-swappable-parameters)
void aligned_deallocate(void* ptr, std::size_t bytes, std::size_t alignment, Dealloc dealloc)
void aligned_deallocate(void* ptr,
[[maybe_unused]] std::size_t bytes,
[[maybe_unused]] std::size_t alignment,
Dealloc dealloc) noexcept
{
(void)alignment;
assert(rmm::is_supported_alignment(alignment));

// Get offset from the location immediately prior to the aligned pointer
// NOLINTNEXTLINE
Expand Down
223 changes: 223 additions & 0 deletions include/rmm/mr/pinned_host_memory_resource.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once

#include <rmm/aligned.hpp>
#include <rmm/detail/aligned.hpp>
#include <rmm/detail/error.hpp>

#include <cuda/memory_resource>
#include <cuda/stream_ref>

#include <cuda_runtime_api.h>

#include <cstddef>
#include <utility>

namespace rmm::mr {

/**
* @brief Memory resource class for allocating pinned host memory.
*
* This class uses CUDA's `cudaHostAlloc` to allocate pinned host memory. It implements the
* `cuda::mr::memory_resource` and `cuda::mr::device_memory_resource` concepts, and
* the `cuda::mr::host_accessible` and `cuda::mr::device_accessible` properties.
*/
class pinned_host_memory_resource {
public:
// Disable clang-tidy complaining about the easily swappable size and alignment parameters
// of allocate and deallocate
// NOLINTBEGIN(bugprone-easily-swappable-parameters)

/**
* @brief Allocates pinned host memory of size at least \p bytes bytes.
*
* @throws `rmm::out_of_memory` if the requested allocation could not be fulfilled due to to a
* CUDA out of memory error.
* @throws `rmm::bad_alloc` if the requested allocation could not be fulfilled due to any other
* reason.
*
* @param bytes The size, in bytes, of the allocation.
* @param alignment Alignment in bytes. Default alignment is used if unspecified.
*
* @return Pointer to the newly allocated memory.
*/
static void* allocate(std::size_t bytes,
[[maybe_unused]] std::size_t alignment = rmm::RMM_DEFAULT_HOST_ALIGNMENT)
{
// don't allocate anything if the user requested zero bytes
if (0 == bytes) { return nullptr; }

return rmm::detail::aligned_allocate(bytes, alignment, [](std::size_t size) {
void* ptr{nullptr};
RMM_CUDA_TRY_ALLOC(cudaHostAlloc(&ptr, size, cudaHostAllocDefault));
return ptr;
});
}

/**
* @brief Deallocate memory pointed to by \p ptr of size \p bytes bytes.
*
* @throws Nothing.
*
* @param ptr Pointer to be deallocated.
* @param bytes Size of the allocation.
* @param alignment Alignment in bytes. Default alignment is used if unspecified.
*/
static void deallocate(void* ptr,
std::size_t bytes,
std::size_t alignment = rmm::RMM_DEFAULT_HOST_ALIGNMENT) noexcept
{
rmm::detail::aligned_deallocate(
ptr, bytes, alignment, [](void* ptr) { RMM_ASSERT_CUDA_SUCCESS(cudaFreeHost(ptr)); });
}

/**
* @brief Allocates pinned host memory of size at least \p bytes bytes.
*
* @note Stream argument is ignored and behavior is identical to allocate.
*
* @throws `rmm::out_of_memory` if the requested allocation could not be fulfilled due to to a
* CUDA out of memory error.
* @throws `rmm::bad_alloc` if the requested allocation could not be fulfilled due to any other
* error.
*
* @param bytes The size, in bytes, of the allocation.
* @param stream CUDA stream on which to perform the allocation (ignored).
* @return Pointer to the newly allocated memory.
*/
static void* allocate_async(std::size_t bytes, [[maybe_unused]] cuda::stream_ref stream)
{
return allocate(bytes);
}

/**
* @brief Allocates pinned host memory of size at least \p bytes bytes and alignment \p alignment.
*
* @note Stream argument is ignored and behavior is identical to allocate.
*
* @throws `rmm::out_of_memory` if the requested allocation could not be fulfilled due to to a
* CUDA out of memory error.
* @throws `rmm::bad_alloc` if the requested allocation could not be fulfilled due to any other
* error.
*
* @param bytes The size, in bytes, of the allocation.
* @param alignment Alignment in bytes.
* @param stream CUDA stream on which to perform the allocation (ignored).
* @return Pointer to the newly allocated memory.
*/
static void* allocate_async(std::size_t bytes,
std::size_t alignment,
[[maybe_unused]] cuda::stream_ref stream)
{
return allocate(bytes, alignment);
}

/**
* @brief Deallocate memory pointed to by \p ptr of size \p bytes bytes.
*
* @note Stream argument is ignored and behavior is identical to deallocate.
*
* @throws Nothing.
*
* @param ptr Pointer to be deallocated.
* @param bytes Size of the allocation.
* @param stream CUDA stream on which to perform the deallocation (ignored).
*/
static void deallocate_async(void* ptr,
std::size_t bytes,
[[maybe_unused]] cuda::stream_ref stream) noexcept
{
return deallocate(ptr, bytes);
}

/**
* @brief Deallocate memory pointed to by \p ptr of size \p bytes bytes and alignment \p
* alignment bytes.
*
* @note Stream argument is ignored and behavior is identical to deallocate.
*
* @throws Nothing.
*
* @param ptr Pointer to be deallocated.
* @param bytes Size of the allocation.
* @param alignment Alignment in bytes.
* @param stream CUDA stream on which to perform the deallocation (ignored).
*/
static void deallocate_async(void* ptr,
std::size_t bytes,
std::size_t alignment,
[[maybe_unused]] cuda::stream_ref stream) noexcept
{
return deallocate(ptr, bytes, alignment);
}
// NOLINTEND(bugprone-easily-swappable-parameters)

/**
* @briefreturn{true if the specified resource is the same type as this resource, otherwise
* false.}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring implies it's possible to compare with another type of resource and get false, but the implementation doesn't allow that. Do we need to update the implementation or the docstrings?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I had that thought. Is there a blanket "false" implementation in the base class somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is how comparison works with cuda::mr. Basically if you try to compare with another type of resource, compilation will fail. Note that refactoring to cuda::mr will necessitate changing the semantics RMM currently (mostly) has for MR equality comparison. #1402

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note also that pinned_host_memory_resource is NOT a device_memory_resource. It simply implements the cuda::mr::memory_resource and cuda::mr::async_memory_resource concepts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also note there is no base class)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the docstring so it doesn't say that false can be returned. Note that we should probably followup with more explicit tests of this MR and future MRs like it. Right now, though, our test machinery for MRs assumes they are all device_memory_resource, so while I can pass a pool_memory_resource<pinned_host_memory_resource> to all the MR tests, I can't pass just pinned_host_memory_resource currently. (It does get tested as the upstream in the former case though, including its operator==).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. If there's no base class, I've just lost track of how the class hierarchy works. I don't have any further comments here but I'll need to refresh myself on how things are supposed to work someday.

*/
bool operator==(const pinned_host_memory_resource&) const { return true; }

/**
* @briefreturn{true if the specified resource is not the same type as this resource, otherwise
* false.}
*/
bool operator!=(const pinned_host_memory_resource&) const { return false; }

/**
* @brief Query whether the resource supports reporting free and available memory.
*
* @return false
*/
static bool supports_get_mem_info() { return false; }

/**
* @brief Query the total amount of memory and free memory available for allocation by this
* resource.
*
* @throws nothing
*
* @return std::pair containing 0 for both total and free memory.
*/
[[nodiscard]] static std::pair<std::size_t, std::size_t> get_mem_info(cuda::stream_ref) noexcept
{
return {0, 0};
}

/**
* @brief Enables the `cuda::mr::device_accessible` property
*
* This property declares that a `pinned_host_memory_resource` provides device accessible memory
*/
friend void get_property(pinned_host_memory_resource const&, cuda::mr::device_accessible) noexcept
{
}

/**
* @brief Enables the `cuda::mr::host_accessible` property
*
* This property declares that a `pinned_host_memory_resource` provides host accessible memory
*/
friend void get_property(pinned_host_memory_resource const&, cuda::mr::host_accessible) noexcept
{
}
};

harrism marked this conversation as resolved.
Show resolved Hide resolved
static_assert(cuda::mr::async_resource_with<pinned_host_memory_resource,
cuda::mr::device_accessible,
cuda::mr::host_accessible>);
} // namespace rmm::mr
34 changes: 24 additions & 10 deletions tests/mr/device/mr_ref_test.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,25 @@ namespace rmm::test {
* @brief Returns if a pointer points to a device memory or managed memory
* allocation.
*/
inline bool is_device_memory(void* ptr)
inline bool is_device_accessible_memory(void* ptr)
{
cudaPointerAttributes attributes{};
if (cudaSuccess != cudaPointerGetAttributes(&attributes, ptr)) { return false; }
return (attributes.type == cudaMemoryTypeDevice) or (attributes.type == cudaMemoryTypeManaged);
return (attributes.type == cudaMemoryTypeDevice) or (attributes.type == cudaMemoryTypeManaged) or
((attributes.type == cudaMemoryTypeHost) and (attributes.devicePointer != nullptr));
}

inline bool is_host_memory(void* ptr)
{
cudaPointerAttributes attributes{};
if (cudaSuccess != cudaPointerGetAttributes(&attributes, ptr)) { return false; }
return attributes.type == cudaMemoryTypeHost;
}

inline bool is_properly_aligned(void* ptr)
{
if (is_host_memory(ptr)) { return rmm::is_pointer_aligned(ptr, rmm::RMM_DEFAULT_HOST_ALIGNMENT); }
return rmm::is_pointer_aligned(ptr, rmm::CUDA_ALLOCATION_ALIGNMENT);
}

enum size_in_bytes : size_t {};
Expand All @@ -79,8 +93,8 @@ inline void test_allocate(resource_ref ref, std::size_t bytes)
try {
void* ptr = ref.allocate(bytes);
EXPECT_NE(nullptr, ptr);
EXPECT_TRUE(rmm::is_pointer_aligned(ptr));
EXPECT_TRUE(is_device_memory(ptr));
EXPECT_TRUE(is_properly_aligned(ptr));
EXPECT_TRUE(is_device_accessible_memory(ptr));
ref.deallocate(ptr, bytes);
} catch (rmm::out_of_memory const& e) {
EXPECT_NE(std::string{e.what()}.find("out_of_memory"), std::string::npos);
Expand All @@ -95,8 +109,8 @@ inline void test_allocate_async(async_resource_ref ref,
void* ptr = ref.allocate_async(bytes, stream);
if (not stream.is_default()) { stream.synchronize(); }
EXPECT_NE(nullptr, ptr);
EXPECT_TRUE(rmm::is_pointer_aligned(ptr));
EXPECT_TRUE(is_device_memory(ptr));
EXPECT_TRUE(is_properly_aligned(ptr));
EXPECT_TRUE(is_device_accessible_memory(ptr));
ref.deallocate_async(ptr, bytes, stream);
if (not stream.is_default()) { stream.synchronize(); }
} catch (rmm::out_of_memory const& e) {
Expand Down Expand Up @@ -203,7 +217,7 @@ inline void test_random_allocations(resource_ref ref,
alloc.size = distribution(generator);
EXPECT_NO_THROW(alloc.ptr = ref.allocate(alloc.size));
EXPECT_NE(nullptr, alloc.ptr);
EXPECT_TRUE(rmm::is_pointer_aligned(alloc.ptr));
EXPECT_TRUE(is_properly_aligned(alloc.ptr));
});

std::for_each(allocations.begin(), allocations.end(), [&ref](allocation& alloc) {
Expand All @@ -229,7 +243,7 @@ inline void test_random_async_allocations(async_resource_ref ref,
EXPECT_NO_THROW(alloc.ptr = ref.allocate(alloc.size));
if (not stream.is_default()) { stream.synchronize(); }
EXPECT_NE(nullptr, alloc.ptr);
EXPECT_TRUE(rmm::is_pointer_aligned(alloc.ptr));
EXPECT_TRUE(is_properly_aligned(alloc.ptr));
});

std::for_each(allocations.begin(), allocations.end(), [stream, &ref](allocation& alloc) {
Expand Down Expand Up @@ -270,7 +284,7 @@ inline void test_mixed_random_allocation_free(resource_ref ref,
EXPECT_NO_THROW(allocations.emplace_back(ref.allocate(size), size));
auto new_allocation = allocations.back();
EXPECT_NE(nullptr, new_allocation.ptr);
EXPECT_TRUE(rmm::is_pointer_aligned(new_allocation.ptr));
EXPECT_TRUE(is_properly_aligned(new_allocation.ptr));
} else {
auto const index = static_cast<int>(index_distribution(generator) % active_allocations);
active_allocations--;
Expand Down Expand Up @@ -317,7 +331,7 @@ inline void test_mixed_random_async_allocation_free(async_resource_ref ref,
EXPECT_NO_THROW(allocations.emplace_back(ref.allocate_async(size, stream), size));
auto new_allocation = allocations.back();
EXPECT_NE(nullptr, new_allocation.ptr);
EXPECT_TRUE(rmm::is_pointer_aligned(new_allocation.ptr));
EXPECT_TRUE(is_properly_aligned(new_allocation.ptr));
} else {
auto const index = static_cast<int>(index_distribution(generator) % active_allocations);
active_allocations--;
Expand Down
Loading