Use `thrust::cuda::par_nosync` if available #780

magnatelee · 2023-02-03T22:29:00Z

No description provided.

src/cunumeric/utilities/thrust_util.h

src/cunumeric/set/unique.cu

src/cunumeric/sort/thrust_sort.cuh

src/cunumeric/set/unique.cu

src/cunumeric/sort/sort.cu

mfoerste4

The only problematic parts are where we manually destroy buffers, all other code does not rely on the stream to be synchronized. I would prefer manual synchronization in combination with a comment over synchronized execution policy in the last thrust call because it is easier to understand and - at least for the occurrence in the merge routine - there are multiple thrust calls depending on input within a loop which makes it messy to select the last one of them.

mfoerste4 · 2023-02-07T13:14:33Z

src/cunumeric/sort/sort.cu

@@ -643,7 +644,7 @@ SegmentMergePiece<legate_type_of<CODE>> merge_all_buffers(
    return result;
  } else {
    // maybe k-way merge is more efficient here...
-    auto exec_policy      = thrust::cuda::par(alloc).on(stream);
+    auto exec_policy      = DEFAULT_POLICY(alloc).on(stream);


We would need to add synchronization before the cleanup loop in L729 in order to protect buffer destruction.

@mfoerste4 can you elaborate why we need to protect buffer destructions?

magnatelee · 2023-02-10T21:45:31Z

@manopapad @mfoerste4 like I said in the meeting, the way we're destroying deferred buffers is safe, though precarious, as long as all the kernels are ordered by the same stream. And we really should come up with a better interface for asynchronous allocations/deallocations so we stay away from this brittle implicit assumption. Unless you guys spot any places that are obviously unsafe, I suggest we move forward and merge this PR.

manopapad · 2023-02-10T22:41:26Z

@manopapad @mfoerste4 like I said in the meeting, the way we're destroying deferred buffers is safe, though precarious, as long as all the kernels are ordered by the same stream. And we really should come up with a better interface for asynchronous allocations/deallocations so we stay away from this brittle implicit assumption. Unless you guys spot any places that are obviously unsafe, I suggest we move forward and merge this PR.

Sounds good, I added some comments to the appropriate place in the core, so this limitation is documented nv-legate/legate.core#566. Please review that I have it right.

Use thrust::cuda::par_nosync if available

6ca627a

magnatelee added the category:improvement PR introduces an improvement and will be classified as such in release notes label Feb 3, 2023

magnatelee requested a review from manopapad February 3, 2023 22:38

manopapad requested changes Feb 7, 2023

View reviewed changes

manopapad requested a review from mfoerste4 February 7, 2023 06:27

mfoerste4 requested changes Feb 7, 2023

View reviewed changes

magnatelee added 2 commits February 10, 2023 11:47

Fix the thrust version check

41953ae

Fix the remaining execution policy creations

8cba3cd

manopapad approved these changes Feb 10, 2023

View reviewed changes

mfoerste4 approved these changes Feb 10, 2023

View reviewed changes

magnatelee merged commit defeb57 into nv-legate:branch-23.03 Feb 14, 2023

magnatelee deleted the thrust_par_nosync branch February 14, 2023 04:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `thrust::cuda::par_nosync` if available #780

Use `thrust::cuda::par_nosync` if available #780

magnatelee commented Feb 3, 2023

mfoerste4 left a comment

mfoerste4 Feb 7, 2023

magnatelee Feb 10, 2023

magnatelee commented Feb 10, 2023

manopapad commented Feb 10, 2023

Use thrust::cuda::par_nosync if available #780

Use thrust::cuda::par_nosync if available #780

Conversation

magnatelee commented Feb 3, 2023

mfoerste4 left a comment

Choose a reason for hiding this comment

mfoerste4 Feb 7, 2023

Choose a reason for hiding this comment

magnatelee Feb 10, 2023

Choose a reason for hiding this comment

magnatelee commented Feb 10, 2023

manopapad commented Feb 10, 2023

Use `thrust::cuda::par_nosync` if available #780

Use `thrust::cuda::par_nosync` if available #780