-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move PREDEFINED_COMMUNICATOR_PAD back to 512 #11373
Comments
I instrumented ompi and ran a few tests, and I am not convinced that increasing the predefined_communicator_pad from 512 to 1024 bytes was necessary. Here are the results of some tests with 4.1.4 and main: in 4.1.4:
in main:
I can run some tests changing the predefined_communicator_pad back to 512 and see whether I run into some problems.
(which is 64 bytes), to
and allocate the element dynamically. Its not something in the critical path. |
I ran a bunch of tests setting |
An update, @jsquyres suggested I compile ompi also with --enable-debug, and that did in fact put us over the limit:
Even if we add a pointer for peruse (8 bytes), I think the solution of making c_name a dynamic array should put us under the 512 bytes limit. I will try to implement that in the next 1-2 days |
make the c_name element of the communicator structure a dynamic element. This allows us to reduce the size of PREDEFINED_COMMUNICATOR_PAD back to 512 to maintain backwards compatibility with the ompi 4.1.x release series. Fixes issue open-mpi#11373 Signed-off-by: Edgar Gabriel <edgar.gabriel@amd.com>
I reviewed the communicator structure, and cleaned up the many unnecessary things that made their way in. As a result, I was able to bring it down to 520 bytes, in what I think is the worst case ( I will make a PR out of my patch, and together with #11405 this will give us a clean slate to work from. |
@edgargabriel I was able to push a commit in your branch. The result is that More than half of this, precisely 272 bytes, is accounted on two structures, that are mostly unused: the |
make the c_name element of the communicator structure a dynamic element. This allows us to reduce the size of PREDEFINED_COMMUNICATOR_PAD back to 512 to maintain backwards compatibility with the ompi 4.1.x release series. Reorder the communicator fields to reduce the struct size. This brings the communicator size at 536 bytes with FT, PERUSE enabled and compiled in debug mode. Fixes issue open-mpi#11373 Signed-off-by: Edgar Gabriel <edgar.gabriel@amd.com> Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
make the c_name element of the communicator structure a dynamic element. This allows us to reduce the size of PREDEFINED_COMMUNICATOR_PAD back to 512 to maintain backwards compatibility with the ompi 4.1.x release series. Reorder the communicator fields to reduce the struct size. This brings the communicator size at 536 bytes with FT, PERUSE enabled and compiled in debug mode. Fixes issue open-mpi#11373 Signed-off-by: Edgar Gabriel <edgar.gabriel@amd.com> Signed-off-by: George Bosilca <bosilca@icl.utk.edu> (cherry picked from commit 2d68804)
Resolved on main and v4.1.x |
make the c_name element of the communicator structure a dynamic element. This allows us to reduce the size of PREDEFINED_COMMUNICATOR_PAD back to 512 to maintain backwards compatibility with the ompi 4.1.x release series. Reorder the communicator fields to reduce the struct size. This brings the communicator size at 536 bytes with FT, PERUSE enabled and compiled in debug mode. Fixes issue open-mpi#11373 Signed-off-by: Edgar Gabriel <edgar.gabriel@amd.com> Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
We want v4.x to be ABI compatible with v5.x.
Per #11365 (comment), it looks like MPI Sessions (#9097) increased
PREDEFINED_COMMUNICATOR_PAD
from 512 to 1024, thereby breaking ABI.Unfortunately, it looks like sizeof a communicator struct is over 512 bytes these days. If we want to preserve 4.x and 5.x ABI, we will need to shrink this down somehow. Brian suggests moving some of the communicator data out into a secondary location that can be accessed through a pointer.
The text was updated successfully, but these errors were encountered: