Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move PREDEFINED_COMMUNICATOR_PAD back to 512 #11373

Closed
jsquyres opened this issue Feb 2, 2023 · 7 comments
Closed

Move PREDEFINED_COMMUNICATOR_PAD back to 512 #11373

jsquyres opened this issue Feb 2, 2023 · 7 comments

Comments

@jsquyres
Copy link
Member

jsquyres commented Feb 2, 2023

We want v4.x to be ABI compatible with v5.x.

Per #11365 (comment), it looks like MPI Sessions (#9097) increased PREDEFINED_COMMUNICATOR_PAD from 512 to 1024, thereby breaking ABI.

Unfortunately, it looks like sizeof a communicator struct is over 512 bytes these days. If we want to preserve 4.x and 5.x ABI, we will need to shrink this down somehow. Brian suggests moving some of the communicator data out into a secondary location that can be accessed through a pointer.

@edgargabriel
Copy link
Member

edgargabriel commented Feb 7, 2023

I instrumented ompi and ran a few tests, and I am not convinced that increasing the predefined_communicator_pad from 512 to 1024 bytes was necessary. Here are the results of some tests with 4.1.4 and main:

in 4.1.4:

sizeof ompi_communicator_t :  352 bytes

in main:

sizeof ompi_communicator_t without ft support:  424 bytes
sizeof ompi_communicator_t with ft support: 448 bytes
sizeof ompi_communicator_t with ft + peruse: 456 bytes (estimated, I do not have peruse, but its one additional pointer)

I can run some tests changing the predefined_communicator_pad back to 512 and see whether I run into some problems.
Otherwise, if we would need to save a few bytes, the easiest option might be to convert the c_name from

 char  c_name[MPI_MAX_OBJECT_NAME];

(which is 64 bytes), to

char *c_name

and allocate the element dynamically. Its not something in the critical path.

@edgargabriel
Copy link
Member

I ran a bunch of tests setting ompi_predefined_communicator_pad to 512, and haven't observed any issues.

@edgargabriel
Copy link
Member

An update, @jsquyres suggested I compile ompi also with --enable-debug, and that did in fact put us over the limit:

sizeof ompi_communicator_t without ft: 512 bytes
sizeof ompi_communicator_t with ft : 536 bytes

Even if we add a pointer for peruse (8 bytes), I think the solution of making c_name a dynamic array should put us under the 512 bytes limit. I will try to implement that in the next 1-2 days

edgargabriel added a commit to edgargabriel/ompi that referenced this issue Feb 11, 2023
make the c_name element of the communicator structure a dynamic
element. This allows us to reduce the size of PREDEFINED_COMMUNICATOR_PAD
back to 512 to maintain backwards compatibility with the ompi 4.1.x release
series.

Fixes issue open-mpi#11373

Signed-off-by: Edgar Gabriel <edgar.gabriel@amd.com>
@bosilca
Copy link
Member

bosilca commented Feb 11, 2023

I reviewed the communicator structure, and cleaned up the many unnecessary things that made their way in. As a result, I was able to bring it down to 520 bytes, in what I think is the worst case (ULFM + debug + peruse). We're still above the 512 limit, and without applying @edgargabriel suggestions I don't see how we can bring this further down.

I will make a PR out of my patch, and together with #11405 this will give us a clean slate to work from.

@edgargabriel
Copy link
Member

@bosilca do you want to file your pr maybe against #11405 to simplify review and the cherry-picking for the 5.0 branch?

@bosilca
Copy link
Member

bosilca commented Feb 12, 2023

@edgargabriel I was able to push a commit in your branch. The result is that sizeof(ompi_communicator_t) is at most 464 bytes in debug mode with PERUSE and ULFM enabled.

More than half of this, precisely 272 bytes, is accounted on two structures, that are mostly unused: the opal_infosubscriber_t with 144 bytes and opal_mutex_t with 128 bytes.

edgargabriel added a commit to edgargabriel/ompi that referenced this issue Feb 14, 2023
make the c_name element of the communicator structure a dynamic
element. This allows us to reduce the size of PREDEFINED_COMMUNICATOR_PAD
back to 512 to maintain backwards compatibility with the ompi 4.1.x release
series.

Reorder the communicator fields to reduce the struct size.
This brings the communicator size at 536 bytes with FT, PERUSE enabled
and compiled in debug mode.

Fixes issue open-mpi#11373

Signed-off-by: Edgar Gabriel <edgar.gabriel@amd.com>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
edgargabriel added a commit to edgargabriel/ompi that referenced this issue Feb 22, 2023
make the c_name element of the communicator structure a dynamic
element. This allows us to reduce the size of PREDEFINED_COMMUNICATOR_PAD
back to 512 to maintain backwards compatibility with the ompi 4.1.x release
series.

Reorder the communicator fields to reduce the struct size.
This brings the communicator size at 536 bytes with FT, PERUSE enabled
and compiled in debug mode.

Fixes issue open-mpi#11373

Signed-off-by: Edgar Gabriel <edgar.gabriel@amd.com>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 2d68804)
@gpaulsen
Copy link
Member

Resolved on main and v4.1.x

yli137 pushed a commit to yli137/ompi that referenced this issue Jan 10, 2024
make the c_name element of the communicator structure a dynamic
element. This allows us to reduce the size of PREDEFINED_COMMUNICATOR_PAD
back to 512 to maintain backwards compatibility with the ompi 4.1.x release
series.

Reorder the communicator fields to reduce the struct size.
This brings the communicator size at 536 bytes with FT, PERUSE enabled
and compiled in debug mode.

Fixes issue open-mpi#11373

Signed-off-by: Edgar Gabriel <edgar.gabriel@amd.com>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants