-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cache hints #1312
Comments
Btw, this can be handled really nice with |
Since alpaka is about being explicit: Maybe we should add alpaka "intrinsics" that emulate the behavior of |
A comment based on our recent experience: T& x() {
return *px;
}
T const& x() const {
return __ldg(px);
} It can be used like T x() const {
return __ldg(px);
} but that can be expensive (or wrong) for complex types, and cannot be used to get the pointer to |
I am really not sure whether we can provide/define a cross platform behavior of architecture specific intrinsics. Taking |
Intel has a SYCL extension for FPGAs called "load-store units". See here. Maybe it would be worthwhile to copy the concept into alpaka where it could look like this: // We are inside an alpaka kernel
using ReadOnlyLSU = alpaka::LoadStoreUnit<alpaka::ReadOnly>;
auto val = ReadOnlyLSU::load(some_ptr); // If using CUDA call __ldg() underneath. Otherwise perform a normal load if there is no equivalent.
ReadOnlyLSU::store(some_ptr, val); // This should cause a compile-time error.
Wouldn't that be kind of illegal in CUDA?
|
The point was to give a small, self-contained example, that shows the problem with the interface of
Actually we can, because the object is either a mutable
Whether T x() const {
if constexpr(can_use_ldg<T>) {
return __ldg(px);
} else {
return *px;
}
} But the interface should be the same for all types, and using So far we have managed to make things work using the |
This is a followup to #18. We require portable load/store functionality with cache hints (such as
__ldg
or__stwb
on CUDA). This should integrate nicely with #1249 and can probably be solved in the same or a followup PR.The text was updated successfully, but these errors were encountered: