Update OpenMP Kernels #211

radelja · 2024-09-18T03:50:39Z

Overview

This PR updates the OpenMP kernels to address an issue with the gather kernel and aligns them closer to their v1.1 implementations. As mentioned in #189, there is still a gap in performance between the current scatter, multiscatter, and sg OpenMP kernels on certain platforms.

✨ Change Description/Rationale

Remove duplicate gather operation in the gather OpenMP kernel
Align the OpenMP kernels closer to the OpenMP kernels in v1.1
Use the dense_perthread buffers in the scatter and multiscatter OpenMP kernels

👀 Reviewer Checklist

All GitHub actions and runners have passed if applicable
Commits are clean and relevant

✅ PR Checklist

Remove or update the template boilerplate text
Commits are relevant and combined where appropriate
Rebase off spatter-devel
Reviewers Requested
Projects associated
Commits mention issue and/or PR numbers at the bottom of the message
Relevant issues are linked into the PR
TODOs are completed
Reviewer checklist is updated

🚀 TODOs

No additional TODOs for this PR

📌 Future Work

Performance alignment of the scatter, multiscatter, and sg kernels on certain platforms (Cascade Lake, Ice Lake, Sandy Bridge...)

jyoung3131 · 2024-09-19T14:46:45Z

src/Spatter/Configuration.cc

        tl[j] = sl[pattern[j]];
      }
    }
  }

-  assert(dense_perthread[rand()%omp_threads][rand()%pattern_length]!=0);
-
-  std::atomic_thread_fence(std::memory_order_release);


Hi @radelja - do we need to remove the atomic thread fence op here? I think this was added as a new "feature" for 2.0.

This was added in PR #12 to Jered's fork, and I assumed it was included as part of testing the gather kernel. Should I add this to the other kernels using the dense_perthread buffer?

We should remove it for now, I think. It needs to be behind a flag when it is added. And should be on every kernel

plavin · 2024-09-23T19:24:02Z

Looks good and runs fine on my machine. Once Jeff's comment about atomics is resolved that we can merge

Update OpenMP Kernels to align closer to v1.1

277d6f9

radelja requested review from plavin and jyoung3131 September 18, 2024 03:50

jyoung3131 reviewed Sep 19, 2024

View reviewed changes

plavin approved these changes Sep 23, 2024

View reviewed changes

plavin merged commit 10ec92d into hpcgarage:spatter-devel Sep 23, 2024

jyoung3131 mentioned this pull request Sep 23, 2024

CPU performance of new/old Spatter varies #189

Closed

3 tasks

radelja mentioned this pull request Oct 4, 2024

🐛 [BUG] - Performance Gap for Gather-Scatter Kernel #221

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update OpenMP Kernels #211

Update OpenMP Kernels #211

radelja commented Sep 18, 2024 •

edited by jyoung3131

Loading

jyoung3131 Sep 19, 2024

radelja Sep 23, 2024

plavin Sep 23, 2024

plavin commented Sep 23, 2024

Update OpenMP Kernels #211

Update OpenMP Kernels #211

Conversation

radelja commented Sep 18, 2024 • edited by jyoung3131 Loading

Overview

✨ Change Description/Rationale

👀 Reviewer Checklist

✅ PR Checklist

🚀 TODOs

📌 Future Work

jyoung3131 Sep 19, 2024

Choose a reason for hiding this comment

radelja Sep 23, 2024

Choose a reason for hiding this comment

plavin Sep 23, 2024

Choose a reason for hiding this comment

plavin commented Sep 23, 2024

radelja commented Sep 18, 2024 •

edited by jyoung3131

Loading