Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

CUB 1.13.0 (NVIDIA HPC SDK 21.7)

Compare
Choose a tag to compare
@alliepiper alliepiper released this 15 Jun 16:43
· 612 commits to main since this release
ae1721b

CUB 1.13.0 is the major release accompanying the NVIDIA HPC SDK 21.7 release.

Notable new features include support for striped data arrangements in block load/store utilities, bfloat16 radix sort support, and fewer restrictions on offset iterators in segmented device algorithms. Several bugs in cub::BlockShuffle, cub::BlockDiscontinuity, and cub::DeviceHistogram have been addressed. The amount of code generated in cub::DeviceScan has been greatly reduced, leading to significant compile-time improvements when targeting multiple PTX architectures.

This release also includes several user-contributed documentation fixes that will be reflected in CUB's online documentation in the coming weeks.

Breaking Changes

  • #320: Deprecated cub::TexRefInputIterator<T, UNIQUE_ID>. Use cub::TexObjInputIterator<T> as a replacement.

New Features

  • #274: Add BLOCK_LOAD_STRIPED and BLOCK_STORE_STRIPED functionality to cub::BlockLoadAlgorithm and cub::BlockStoreAlgorithm. Thanks to Matthew Nicely (@mnicely) for this contribution.
  • #291: cub::DeviceSegmentedRadixSort and cub::DeviceSegmentedReduce now support different types for begin/end offset iterators. Thanks to Sergey Pavlov (@psvvsp) for this contribution.
  • #306: Add bfloat16 support to cub::DeviceRadixSort. Thanks to Xiang Gao (@zasdfgbnm) for this contribution.
  • #320: Introduce a new CUB_IGNORE_DEPRECATED_API macro that disables deprecation warnings on Thrust and CUB APIs.

Bug Fixes

  • #277: Fixed sanitizer warnings in RadixSortScanBinsKernels. Thanks to Andy Adinets (@canonizer) for this contribution.
  • #287: cub::DeviceHistogram now correctly handles cases where OffsetT is not an int. Thanks to Dominique LaSalle (@nv-dlasalle) for this contribution.
  • #311: Fixed several bugs and added tests for the cub::BlockShuffle collective operations.
  • #312: Eliminate unnecessary kernel instantiations when compiling cub::DeviceScan. Thanks to Elias Stehle (@elstehle) for this contribution.
  • #319: Fixed out-of-bounds memory access on debugging builds of cub::BlockDiscontinuity::FlagHeadsAndTails.
  • #320: Fixed harmless missing return statement warning in unreachable cub::TexObjInputIterator code path.

Other Enhancements

  • Several documentation fixes are included in this release.
    • #275: Fixed comments describing the cub::If and cub::Equals utilities. Thanks to Rukshan Jayasekara (@rukshan99) for this contribution.
    • #290: Documented that cub::DeviceSegmentedReduce will produce consistent results run-to-run on the same device for pseudo-associated reduction operators. Thanks to Himanshu (@himanshu007-creator) for this contribution.
    • #298: CONTRIBUTING.md now refers to Thrust's build instructions for developer builds, which is the preferred way to build the CUB test harness. Thanks to Xiang Gao (@zasdfgbnm) for contributing.
    • #301: Expand cub::DeviceScan documentation to include in-place support and add tests. Thanks to Xiang Gao (@zasdfgbnm) for this contribution.
    • #307: Expand cub::DeviceRadixSort and cub::BlockRadixSort documentation to clarify stability, in-place support, and type-specific bitwise transformations. Thanks to Himanshu (@himanshu007-creator) for contributing.
    • #316: Move WARP_TIME_SLICING documentation to the correct location. Thanks to Peter Han (@Peter9606) for this contribution.
    • #321: Update URLs from deprecated github.com to preferred github.io. Thanks to Lilo Huang (@lilohuang) for this contribution.