Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use thrust::cuda::par_nosync if available #780

Merged
merged 3 commits into from
Feb 14, 2023

Conversation

magnatelee
Copy link
Contributor

No description provided.

@magnatelee magnatelee added the category:improvement PR introduces an improvement and will be classified as such in release notes label Feb 3, 2023
src/cunumeric/utilities/thrust_util.h Outdated Show resolved Hide resolved
src/cunumeric/utilities/thrust_util.h Outdated Show resolved Hide resolved
src/cunumeric/set/unique.cu Show resolved Hide resolved
src/cunumeric/sort/thrust_sort.cuh Show resolved Hide resolved
src/cunumeric/set/unique.cu Show resolved Hide resolved
src/cunumeric/sort/sort.cu Show resolved Hide resolved
Copy link
Contributor

@mfoerste4 mfoerste4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only problematic parts are where we manually destroy buffers, all other code does not rely on the stream to be synchronized. I would prefer manual synchronization in combination with a comment over synchronized execution policy in the last thrust call because it is easier to understand and - at least for the occurrence in the merge routine - there are multiple thrust calls depending on input within a loop which makes it messy to select the last one of them.

@@ -643,7 +644,7 @@ SegmentMergePiece<legate_type_of<CODE>> merge_all_buffers(
return result;
} else {
// maybe k-way merge is more efficient here...
auto exec_policy = thrust::cuda::par(alloc).on(stream);
auto exec_policy = DEFAULT_POLICY(alloc).on(stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to add synchronization before the cleanup loop in L729 in order to protect buffer destruction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfoerste4 can you elaborate why we need to protect buffer destructions?

@magnatelee
Copy link
Contributor Author

@manopapad @mfoerste4 like I said in the meeting, the way we're destroying deferred buffers is safe, though precarious, as long as all the kernels are ordered by the same stream. And we really should come up with a better interface for asynchronous allocations/deallocations so we stay away from this brittle implicit assumption. Unless you guys spot any places that are obviously unsafe, I suggest we move forward and merge this PR.

@manopapad
Copy link
Contributor

@manopapad @mfoerste4 like I said in the meeting, the way we're destroying deferred buffers is safe, though precarious, as long as all the kernels are ordered by the same stream. And we really should come up with a better interface for asynchronous allocations/deallocations so we stay away from this brittle implicit assumption. Unless you guys spot any places that are obviously unsafe, I suggest we move forward and merge this PR.

Sounds good, I added some comments to the appropriate place in the core, so this limitation is documented nv-legate/legate.core#566. Please review that I have it right.

@magnatelee magnatelee merged commit defeb57 into nv-legate:branch-23.03 Feb 14, 2023
@magnatelee magnatelee deleted the thrust_par_nosync branch February 14, 2023 04:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:improvement PR introduces an improvement and will be classified as such in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants