-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCT/IB/MLX5: Prevent compiler to replace SSE instructions by memmove() #9714
Conversation
@QuesarVII, thanks I could repro the issue. Could you please try if PR fixes all reproductions? It could be also interesting to independently confirm that the Makefile patch mentioned in Description also fixes all reproductions. |
Yes, I can confirm that the PR resolves the issue. I tested a few different MPI jobs of different sizes and quantities of communication and they are all working. I can also confirm the no-tree-loop Makefile patch works. I tested with just that patch, without PR 9714 and 9692 being applied. Thanks for resolving this! |
src/uct/ib/mlx5/ib_mlx5.inl
Outdated
UCS_STATIC_ASSERT(MLX5_SEND_WQE_BB == 64); | ||
_mm_storeu_si128((__m128i *)dst, *(__m128i *)src); | ||
_mm_storeu_si128(((__m128i *)dst + 1), *((__m128i *)src + 1)); | ||
_mm_storeu_si128(((__m128i *)dst + 2), *((__m128i *)src + 2)); | ||
_mm_storeu_si128(((__m128i *)dst + 3), *((__m128i *)src + 3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use same method as in lines 516-519?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, makes sense to try to avoid explict instructions, retested -O2
/-O3
on submitter testbed. Double checked sizeof(__m128i) == 16
.
@yosefe, I guess we need it in v1.16.x also, but no need to update NEWS file since rc4 is not yet out? |
Pls port it to v1.16.x and update the NEWS as part of the PR |
I think it is better to update NEWS once right before the publishing of RC. Otherwise, one more PR may be required just to update the date. |
/azp run |
Azure Pipelines successfully started running 4 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 4 pipeline(s). |
/azp run UCX PR |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run UCX PR |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run UCX PR |
Azure Pipelines successfully started running 1 pipeline(s). |
What
Prevent compiler to replace SSE instructions with
memmove()
. This is continuation of #9692.Why ?
With
-O2
compiler can replace code bymemmove()
, even when SSE types are used.How ?
Use structure assignment like the non-SSE case to avoid
memmove()
optimization.Alternative
Patch below also works, but we want to avoid to selectively and globally disable optimizations: