Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update OpenMP Kernels #211

Merged
merged 1 commit into from
Sep 23, 2024
Merged

Conversation

radelja
Copy link
Contributor

@radelja radelja commented Sep 18, 2024

Overview

This PR updates the OpenMP kernels to address an issue with the gather kernel and aligns them closer to their v1.1 implementations. As mentioned in #189, there is still a gap in performance between the current scatter, multiscatter, and sg OpenMP kernels on certain platforms.

✨ Change Description/Rationale

  • Remove duplicate gather operation in the gather OpenMP kernel
  • Align the OpenMP kernels closer to the OpenMP kernels in v1.1
  • Use the dense_perthread buffers in the scatter and multiscatter OpenMP kernels

👀 Reviewer Checklist

  • All GitHub actions and runners have passed if applicable
  • Commits are clean and relevant

✅ PR Checklist

  • Remove or update the template boilerplate text
  • Commits are relevant and combined where appropriate
  • Rebase off spatter-devel
  • Reviewers Requested
  • Projects associated
  • Commits mention issue and/or PR numbers at the bottom of the message
  • Relevant issues are linked into the PR
  • TODOs are completed
  • Reviewer checklist is updated

🚀 TODOs

  • No additional TODOs for this PR

📌 Future Work

  • Performance alignment of the scatter, multiscatter, and sg kernels on certain platforms (Cascade Lake, Ice Lake, Sandy Bridge...)

tl[j] = sl[pattern[j]];
}
}
}

assert(dense_perthread[rand()%omp_threads][rand()%pattern_length]!=0);

std::atomic_thread_fence(std::memory_order_release);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @radelja - do we need to remove the atomic thread fence op here? I think this was added as a new "feature" for 2.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added in PR #12 to Jered's fork, and I assumed it was included as part of testing the gather kernel. Should I add this to the other kernels using the dense_perthread buffer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove it for now, I think. It needs to be behind a flag when it is added. And should be on every kernel

@plavin
Copy link
Contributor

plavin commented Sep 23, 2024

Looks good and runs fine on my machine. Once Jeff's comment about atomics is resolved that we can merge

@plavin plavin merged commit 10ec92d into hpcgarage:spatter-devel Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants