Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Fix segmented sort device-side launch #410

Merged
merged 2 commits into from
Dec 11, 2021

Conversation

gevtushenko
Copy link
Collaborator

This PR fixes #409:

  1. Host-side calls don't synchronize with the default stream.
  2. Device-side calls synchronize stream before group_sizes usage.

I've also added a simple device-side launch test. In order to get it compiling, run CMake with

cmake -DCUB_ENABLE_TESTS_WITH_RDC=ON ...

and add #define CUB_CDP in the test_device_segmented_sort.cu.

@alliepiper alliepiper added the P0: must have Absolutely necessary. Critical issue, major blocker, etc. label Dec 3, 2021
@alliepiper alliepiper added this to the 1.16.0 milestone Dec 3, 2021
@gevtushenko gevtushenko force-pushed the main-fix/github/seg_sort_stream branch from 044f49c to 20e1ff4 Compare December 4, 2021 13:59
@gevtushenko
Copy link
Collaborator Author

gevtushenko commented Dec 4, 2021

gpuCI: NVIDIA/thrust#1576
DVS: 30764075

@gevtushenko gevtushenko added testing: gpuCI in progress Started gpuCI testing. testing: gpuCI passed Passed gpuCI testing. testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). and removed testing: gpuCI in progress Started gpuCI testing. labels Dec 4, 2021
@gevtushenko gevtushenko force-pushed the main-fix/github/seg_sort_stream branch from 20e1ff4 to 4d486f9 Compare December 11, 2021 10:16
@gevtushenko gevtushenko added testing: internal ci passed Passed internal NVIDIA CI (DVS). and removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). labels Dec 11, 2021
@gevtushenko gevtushenko merged commit cd8b072 into NVIDIA:main Dec 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P0: must have Absolutely necessary. Critical issue, major blocker, etc. testing: gpuCI passed Passed gpuCI testing. testing: internal ci passed Passed internal NVIDIA CI (DVS).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DeviceSegmentedSort synchronizes default stream and produces wrong results when launched from a kernel
2 participants