Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-defined op with derived datatypes yields space-inefficient reduce #339

Open
mpiforumbot opened this issue Jul 24, 2016 · 2 comments
Open

Comments

@mpiforumbot
Copy link
Collaborator

mpiforumbot commented Jul 24, 2016

Originally by jdinan on 2012-06-06 13:44:07 -0500


Description

Currently, using a derived datatype with an MPI reduction operation also requires the use of a user-defined MPI_Op. The function that implements an MPI_Op has the following C prototype:

void op_fcn(void *in, void *inout, int *count, MPI_Datatype *dtype)

Note that that user-define operations accept two buffers, but only one count and datatype. Because of this, both buffers must have the layout described by the count and datatype.

Consider a reduction on a column of a large row-major array. We can easily do a reduce operation directly on the column using an MPI vector datatype. Because this is not a built-in datatype, we must also provide a user-defined op to the reduction operation. The user-defined op expects all data to have the same layout because it takes only one datatype/count. Thus, MPI must reconstruct the sender's entire array before invoking the user-defined op, resulting in severe space inefficiency for this operation.

A test case is attached to the ticket that demonstrates the memory consumption issue.

Extended Scope

none.

History

none.

Proposed Solution

Define an MPI_Op that accepts one datatype for each buffer:

void op_fcn(void *in, int *count_in, MPI_Datatype *dtype_in, void *inout, int *count_inout, MPI_Datatype *dtype_inout)

This would allow MPI to pass one buffer in its packed form rather than recreating it's layout at the source.

This op could become challenging for a user to implement, thus it is necessary to investigate mechanisms to simplify this task. One possibility would be defining an op that takes two datatypes and one count. The MPI implementation would have to transform one or both datatypes to make individual units congruent. This seems doable for reductions since all processes must pass the same datatype.

Impact on Implementations

Impact on Applications and Users

Currently, reductions with derived datatypes are extremely inefficient. Fixing this issue would provide a significant performance enhancement.

Alternative Solutions

Several alternative solutions are possible:

  1. Users can pack data before calling MPI_Reduce to avoid this problem.
  2. An MPI implementation could pack both the in and inout buffers and pass both packed buffers to the user-define operation. When packed, both should share the same datatype and count. However, this approach still has significant space overhead.
@mpiforumbot
Copy link
Collaborator Author

Originally by jdinan on 2012-06-06 13:44:48 -0500


Attachment added: reduce_user_dt_and_op.c (1.3 KiB)
Test case, which demonstrates memory consumption problem.

@mpiforumbot
Copy link
Collaborator Author

Originally by jhammond on 2014-09-09 04:57:17 -0500


We should also try to support MPI_IN_PLACE in user-defined reductions with this ticket. I'll add the text later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant