Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance bugs in scalar reductions #509

Merged

Commits on Aug 5, 2022

  1. Configuration menu
    Copy the full SHA
    90165d5 View commit details
    Browse the repository at this point in the history
  2. Fix performance bugs in scalar reduction kernels:

    * Use unsigned 64-bit integers instead of signed integers wherever
      possible; CUDA hasn't added an atomic intrinsic for the latter yet.
    
    * Move reduction buffers from zero-copy memory to framebuffer. This
      makes the slow atomic update code path in reduction operators
      run much more efficiently.
    magnatelee committed Aug 5, 2022
    Configuration menu
    Copy the full SHA
    2c0c208 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    b204473 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    14cd060 View commit details
    Browse the repository at this point in the history
  5. Minor clean up per review

    magnatelee committed Aug 5, 2022
    Configuration menu
    Copy the full SHA
    9383c2a View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    2533b0d View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    7d43246 View commit details
    Browse the repository at this point in the history