Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/IB/MLX5: Prevent compiler to replace SSE instructions by memmove() #9714

Merged
merged 1 commit into from
Mar 17, 2024

Conversation

tvegas1
Copy link
Contributor

@tvegas1 tvegas1 commented Feb 27, 2024

What

Prevent compiler to replace SSE instructions with memmove(). This is continuation of #9692.

Why ?

With -O2 compiler can replace code by memmove(), even when SSE types are used.

How ?

Use structure assignment like the non-SSE case to avoid memmove() optimization.

Alternative

Patch below also works, but we want to avoid to selectively and globally disable optimizations:

diff --git a/src/uct/ib/Makefile.am b/src/uct/ib/Makefile.am
index 3974e3784..cad31a1bc 100644
--- a/src/uct/ib/Makefile.am
+++ b/src/uct/ib/Makefile.am
@@ -9,7 +9,7 @@ SUBDIRS = . rdmacm

 module_LTLIBRARIES    = libuct_ib.la
 libuct_ib_la_CPPFLAGS = $(BASE_CPPFLAGS) $(IBVERBS_CPPFLAGS)
-libuct_ib_la_CFLAGS   = $(BASE_CFLAGS)
+libuct_ib_la_CFLAGS   = $(BASE_CFLAGS) -fno-tree-loop-distribute-patterns
 libuct_ib_la_LIBADD   = $(top_builddir)/src/ucs/libucs.la \
                         $(top_builddir)/src/uct/libuct.la
 libuct_ib_la_LDFLAGS  = $(IBVERBS_LDFLAGS) -version-info $(SOVERSION)

@tvegas1
Copy link
Contributor Author

tvegas1 commented Feb 27, 2024

@QuesarVII, thanks I could repro the issue. Could you please try if PR fixes all reproductions? It could be also interesting to independently confirm that the Makefile patch mentioned in Description also fixes all reproductions.

@QuesarVII
Copy link

Yes, I can confirm that the PR resolves the issue. I tested a few different MPI jobs of different sizes and quantities of communication and they are all working.

I can also confirm the no-tree-loop Makefile patch works. I tested with just that patch, without PR 9714 and 9692 being applied.

Thanks for resolving this!

Comment on lines 508 to 512
UCS_STATIC_ASSERT(MLX5_SEND_WQE_BB == 64);
_mm_storeu_si128((__m128i *)dst, *(__m128i *)src);
_mm_storeu_si128(((__m128i *)dst + 1), *((__m128i *)src + 1));
_mm_storeu_si128(((__m128i *)dst + 2), *((__m128i *)src + 2));
_mm_storeu_si128(((__m128i *)dst + 3), *((__m128i *)src + 3));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use same method as in lines 516-519?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, makes sense to try to avoid explict instructions, retested -O2/-O3 on submitter testbed. Double checked sizeof(__m128i) == 16.

@tvegas1
Copy link
Contributor Author

tvegas1 commented Feb 29, 2024

@yosefe, I guess we need it in v1.16.x also, but no need to update NEWS file since rc4 is not yet out?

@yosefe
Copy link
Contributor

yosefe commented Feb 29, 2024

@yosefe, I guess we need it in v1.16.x also, but no need to update NEWS file since rc4 is not yet out?

Pls port it to v1.16.x and update the NEWS as part of the PR

@rakhmets
Copy link
Collaborator

rakhmets commented Mar 1, 2024

@yosefe, I guess we need it in v1.16.x also, but no need to update NEWS file since rc4 is not yet out?

Pls port it to v1.16.x and update the NEWS as part of the PR

I think it is better to update NEWS once right before the publishing of RC. Otherwise, one more PR may be required just to update the date.

@tvegas1
Copy link
Contributor Author

tvegas1 commented Mar 4, 2024

/azp run

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@tvegas1
Copy link
Contributor Author

tvegas1 commented Mar 8, 2024

/azp run

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@yosefe
Copy link
Contributor

yosefe commented Mar 12, 2024

/azp run UCX PR

@openucx openucx deleted a comment from azure-pipelines bot Mar 12, 2024
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@openucx openucx deleted a comment from azure-pipelines bot Mar 12, 2024
@tvegas1
Copy link
Contributor Author

tvegas1 commented Mar 15, 2024

/azp run UCX PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yosefe
Copy link
Contributor

yosefe commented Mar 15, 2024

/azp run UCX PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yosefe yosefe merged commit 96c8626 into openucx:master Mar 17, 2024
142 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants