-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use thrust::cuda::par_nosync
if available
#780
Use thrust::cuda::par_nosync
if available
#780
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only problematic parts are where we manually destroy buffers, all other code does not rely on the stream to be synchronized. I would prefer manual synchronization in combination with a comment over synchronized execution policy in the last thrust call because it is easier to understand and - at least for the occurrence in the merge routine - there are multiple thrust calls depending on input within a loop which makes it messy to select the last one of them.
@@ -643,7 +644,7 @@ SegmentMergePiece<legate_type_of<CODE>> merge_all_buffers( | |||
return result; | |||
} else { | |||
// maybe k-way merge is more efficient here... | |||
auto exec_policy = thrust::cuda::par(alloc).on(stream); | |||
auto exec_policy = DEFAULT_POLICY(alloc).on(stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to add synchronization before the cleanup loop in L729 in order to protect buffer destruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mfoerste4 can you elaborate why we need to protect buffer destructions?
@manopapad @mfoerste4 like I said in the meeting, the way we're destroying deferred buffers is safe, though precarious, as long as all the kernels are ordered by the same stream. And we really should come up with a better interface for asynchronous allocations/deallocations so we stay away from this brittle implicit assumption. Unless you guys spot any places that are obviously unsafe, I suggest we move forward and merge this PR. |
Sounds good, I added some comments to the appropriate place in the core, so this limitation is documented nv-legate/legate.core#566. Please review that I have it right. |
No description provided.