Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance bugs in scalar reductions (#509) #543

Merged
merged 1 commit into from
Aug 17, 2022

Conversation

marcinz
Copy link
Collaborator

@marcinz marcinz commented Aug 17, 2022

  • Unify the template for device reduction tree and do some cleanup

  • Fix performance bugs in scalar reduction kernels:

  • Use unsigned 64-bit integers instead of signed integers wherever
    possible; CUDA hasn't added an atomic intrinsic for the latter yet.

  • Move reduction buffers from zero-copy memory to framebuffer. This
    makes the slow atomic update code path in reduction operators
    run much more efficiently.

  • Use thew new scalar reduction buffer in binary reductions as well

  • Use only the RHS type in the reduction buffer as we never call apply

  • Minor clean up per review

  • Rename the buffer class and method to make the intent explicit

  • Flip the polarity of reduce's template parameter

* Unify the template for device reduction tree and do some cleanup

* Fix performance bugs in scalar reduction kernels:

* Use unsigned 64-bit integers instead of signed integers wherever
  possible; CUDA hasn't added an atomic intrinsic for the latter yet.

* Move reduction buffers from zero-copy memory to framebuffer. This
  makes the slow atomic update code path in reduction operators
  run much more efficiently.

* Use thew new scalar reduction buffer in binary reductions as well

* Use only the RHS type in the reduction buffer as we never call apply

* Minor clean up per review

* Rename the buffer class and method to make the intent explicit

* Flip the polarity of reduce's template parameter
@marcinz marcinz merged commit 2959f0a into nv-legate:branch-22.07 Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants