Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intro nonblocking: no serialization & no deadlocks #122

Closed
RolfRabenseifner opened this issue Feb 18, 2019 · 26 comments
Closed

Intro nonblocking: no serialization & no deadlocks #122

RolfRabenseifner opened this issue Feb 18, 2019 · 26 comments
Assignees
Labels
had reading Completed the formal proposal reading passed final vote Passed the final formal vote passed first vote Passed the first formal vote scheduled reading Reading is scheduled for the next meeting wg-p2p Point-to-Point Working Group

Comments

@RolfRabenseifner
Copy link

RolfRabenseifner commented Feb 18, 2019

Problem

Although overlapping communication and communication, this means sends and receives, is a significant usecase of nonblocking communication (e.g. the first example code in the book "Using MPI"), and mainly to prevent deadlocks and serialization of communication, the MPI standards only mentions the overlap of computation and communication in the Intro of Section 3.7 Nonblocking Communication. The result is a lot of books, tutorials, teaching that never discusses to prevent serialization by using nonblocking communication. And therefore, there exists a lot of inefficient application code with serialization wasting a significant amount of compute power, electric energy (and producing CO2).

Proposal

Start Section 3.7 Nonblocking Communication, MPI-3.1, page 47 lines 7-11 with a new paragraph and remove outdated text.
I asked Bill Gropp at the Zurich meeting for a new intro. See his text proposal below.
The text after the striked out text is unchanged text from MPI-3.1.

Changes to the Text

The first paragraph of Section 3.7 should be substituted by:

Section 3.7 Nonblocking Communication

Nonblocking communication is important both for reasons of correctness and performance.
For complex communication patterns, the use of only blocking communication (without buffering) is difficult because the programmer must ensure that each send is matched with a receive in an order that avoids deadlock. For communication patterns that are determined only at run time, this is even
more difficult. Using nonblocking communication avoids this problem, allowing programmers to express complex and possibly dynamic communication patters without needing to ensure that all sends and receives are issued in an order that prevents deadlock (see Section 3.5 and the discussion of “safe" programs). Nonblocking communication also allows for the overlap of communication with different communication operations, e.g., to prevent the serialization of such operations, and for the overlap of communication with computation. Whether an implementation is able to accomplish an effective (from a performance standpoint) overlap of operations depends on the implementation itself and the system on which the implementation is running. Using nonblocking operations permits an implementation to overlap communication with computation, but does not require it to do so.
One can improve performance on many systems by overlapping communication and computation. This is especially true on systems where communication can be executed autonomously by an intelligent communication controller. Light-weight threads are one mechanism for achieving such overlap. An alternative mechanism that often leads to better performance is to use nonblocking communication.
A nonblocking send start call initiates the send operation, but does not complete it. The send start call can return before the message was copied out of the send buffer. A separate send complete call is needed to complete the communication, i.e., to verify that the data has been copied out of the send buffer. With suitable hardware, the transfer of data out of the sender memory may proceed concurrently with computations done at the sender after the send was initiated and before it completed. Similarly, a nonblocking receive start call initiates the receive operation, but does not complete it. The call can return before a message is stored into the receive buffer. A separate receive complete call is needed to complete the receive operation and verify that the data has been received into the receive buffer. With suitable hardware, the transfer of data into the receiver memory may proceed concurrently with computations done after the receive was initiated and before it completed. The use of nonblocking receives may also avoid system buffering and memory-to-memory copying, as information is provided early on the location of the receive buffer.

The latest PDF (with change-bars "ticket122", see section 3.7 on page 49) for reading in Albuquerque in Dec 2019 is here: mpi-report-ticket122-2019-NOV-25.pdf

Final version after all reading (from Albuquerque, Dec. 2019) plus no-no-vote changes until Portland (Feb. 2020):
mpi-report-issue122-NB-intro-2020-02-20b-annotated.pdf

Impact on Implementations

None.

Impact on Users

Directly none. In longterm, they may better understand MPI nonblocking and its usecases.

References

Pull request

The PR is https://github.com/mpi-forum/mpi-standard/pull/99

@dholmes-epcc-ed-ac-uk
Copy link
Member

Amended text in PDF for reading:
mpi-report-ticket122.pdf

@dholmes-epcc-ed-ac-uk dholmes-epcc-ed-ac-uk added the scheduled reading Reading is scheduled for the next meeting label Feb 18, 2019
@RolfRabenseifner
Copy link
Author

Annotated version of the pdf file above:
mpi-report-ticket122-annotated.pdf

@jeffhammond
Copy link
Member

I have no opinion on this change but would like to dispute both justifications:

  1. The MPI standard is not a user guide. Shortcomings in MPI-related educational materials do not justify text in the MPI standard. People who do not understand the opportunities for concurrency in nonblocking communication have no business writing MPI books or tutorials.
  2. MPI serialization is not a significant contributor to energy waste or climate change. Load-imbalance is a far greater source of computational waste. The MPI standard cannot solve this problem and should not attempt to solve problems like this, no matter how members of the MPI Forum feel about them.

@jsquyres
Copy link
Member

jsquyres commented Mar 7, 2019

@RolfRabenseifner and I talked about this and came up with alternate language:

The purpose of nonblocking communication is to permit overlapping communication with any other activity. For example, overlapping multiple send and receive operations can allow greater efficiency and can help resolve serialization and deadlock issues.

And then lead into the existing text: "Another advantage is that one can improve performance ..."

@wgropp
Copy link

wgropp commented Mar 7, 2019 via email

@RolfRabenseifner
Copy link
Author

Minutes from the reading of this issue on March 6, 2019:

  • The proposal is okay (straw poll: all yes except one no).
  • But we should add a sentence at the beginning that nonblocking is in general for any overlapping.
  • Additionally, I should add a subsection later in the section with an example in C or psydocode (not in Fortran, because all other examples are in Fortran) that shows the benefit of non-serializing communication with nonblocking calls.

Minutes from the re-reading on March 7, 2019:

  • Text as proposed by @jsquyres (at 9:31 AM EST).
  • Straw poll: all are in favor of this rewording (this was before Bill's comment).

@RolfRabenseifner
Copy link
Author

@wgropp : I'm not sure whether your comment supports our proposal or not.

@dholmes-epcc-ed-ac-uk
Copy link
Member

@wgropp The proposed text is using "overlap" in a general sense, i.e. to overlap with any other activity (perhaps computation - e.g. to hide latency, perhaps other communication - e.g. to avoid deadlock or serialisation of message transfers). It is may be more usual to use "overlap" to mean the narrow sense of "asynchronous communication to permit concurrent computation" - leading to confusion.

All: should we use a different word?

@wgropp
Copy link

wgropp commented Mar 8, 2019

The first sentence in the proposal is incorrect. The purpose of nonblocking is not to enable the overlap of communication and computation. While it is true the having some form of nonblocking operation is required for overlap, that is an additional benefit, not the purpose.

It would be better to start with the original reason - then note that this make overlap possible, then talk about the benefits of having asynchronous or overlapped communication.

@wgropp
Copy link

wgropp commented Mar 8, 2019

I almost hate to say it, but nonblocking is a semantic concept independent of MPI progress. The proposal confuses the two.

@wgropp
Copy link

wgropp commented Mar 8, 2019

And in response to Dan's comment about the language covering this case, that's what I meant by "in the broadest sense". My concern is that almost everyone reading the proposed text will not understand that sense, and further will come to expect asynchronous execution, which is not part of the semantics of MPI non-blocking.

@RolfRabenseifner
Copy link
Author

Based on a new text proposal from Bill Gropp, I updated the proposed solution above in the description of this issue.
Caution: The PR 99 is not yet updated.

@dholmes-epcc-ed-ac-uk
Copy link
Member

The PR https://github.com/mpi-forum/mpi-standard/pull/99 is now updated.

The latest PDF (with change-bars, see section 3.7 on page 49) for reading in Albuquerque in Dec 2019 is here:
mpi-report-ticket122-2019-NOV-25.pdf

@RolfRabenseifner
Copy link
Author

The PR mpi-forum/mpi-standard#99 is now updated.

The latest PDF (with change-bars, see section 3.7 on page 49) for reading in Albuquerque in Dec 2019 is here:
mpi-report-ticket122-2019-NOV-25.pdf

Dear Dan,
The change-log is still missing. You can find the text from Albuquerque in the description of this issue:

TBD: Change-log entry that users, trainers, and book authors can detect the change of the intro and can adopt it in their software and teaching material.
Proposal for the Change-log:
Section 3.7 on page 49.
The introduction of MPI nonblocking communication was corrected to better describe correctness and performance reasons for the use of nonblocking communication.

Please can you update your pull request and provide a new pdf before our 2 week deadline.
Then a no-no-vote for the change-log and a first vote can be scheduled for Portland.

@wgropp
Copy link

wgropp commented Jan 30, 2020 via email

@dholmes-epcc-ed-ac-uk
Copy link
Member

I have no strong opinion one way or the other w.r.t. a change-log entry for this clarification.

Also, @RolfRabenseifner this is your ticket, not mine. I will support it with my vote because the wording change is valuable, but I don't have time this close to the 2-week announcement deadline to work on it. Sorry.

@RolfRabenseifner
Copy link
Author

Dear Bill,

if you expect that the MPI users and MPI implementors are the only reader of the MPI standard, then you would have been right.

But we have also to serve the book writers and providers of MPI training and teaching itself and of MPI training/teaching materials.

So many teach MPI nonblocking only for overlapping communication with computation.

The reason for this is, that

  • MPI-1 to MPI3.1 starts the nonblocking chapter with
    "One can improve performance on many systems by overlapping communication and computation.
    This is especially true on systems where communication can be executed autonomously."
  • many books on MPI start also their nonblocking chapter with
    "One can improve performance on many systems by overlapping communication and computation.
    This is especially true on systems where communication can be executed autonomously."
  • and course material then mention nonblocking MPI only in this context.

I now advertise for more than 20 years the book "Using MPI" from Gropp, Lusk, Skjellum,
but also here its Chapter 4.4 Using Nonblocking Communications starts with

"On most parallel computers, moving data from one process to another takes
more time than moving or manipulating data within a single process. For example,
on one modern parallel computer, each process can compute up to 10 billion
floating-point results per second, but can only move roughly two hundred million
words per second between processes. To keep a program from being slowed down
(also described as "starved for data"), many parallel computers allow users to
start sending (and receiving) several messages and to proceed with other operations."

And if you google for
"One can improve performance on many systems by overlapping communication and computation."
you can directly find many such examples:

If such authors try to keep their teaching material up-to-date, then I expect
that they should at least look at the change-log of each new version of the MPI standard.

And therefore, this changelog entry is such important.

You can never expect that they start to read the whole MPI standard again
from the beginning to find relevant changes.

This still means that we may have to live the next 10 years with wrong stories
on MPI nonblocking communication, but the changelog plus the new text
may result in more better teaching and understanding and good use of MPI.

What do you expect, when we will see a corrected new edition of "Using MPI".
And would it have this section corrected without the effort we are doing here in #122?

Best regards
Rolf

@RolfRabenseifner
Copy link
Author

Updates for Portland (in PR99 and pdf):

  • Removal of change-macros
  • Change-log
  • General index

PDF:
mpi-report-issue122-NB-intro-2020-02-04-annotated.pdf

@RolfRabenseifner RolfRabenseifner added the had reading Completed the formal proposal reading label Feb 4, 2020
@wesbland
Copy link
Member

There was a straw vote in Portland about whether or not to have a changelog entry:

Yes: 15
No: 3
Abstain: 5

@RolfRabenseifner
Copy link
Author

mpi-report-issue122-NB-intro-2020-02-20-annotated.pdf
Latest version with two additional small changes from reading today for no-no-vote and 1st vote in Portland

@RolfRabenseifner
Copy link
Author

Result of the two no-no-votes today, Feb. 20, 2020 morning in Portland::

  • adding the changlog: failed
  • correction to the text: passed
    Therefore, I'll remove the changelog entry from the latex.

@RolfRabenseifner
Copy link
Author

Final version after all reading (from Albuquerque, Dec. 2019) plus no-no-vote changes until Portland (Feb. 2020):
mpi-report-issue122-NB-intro-2020-02-20b-annotated.pdf

@wesbland
Copy link
Member

This proposal did not meet ballot quorum for one no-no vote (adding a changelog entry) at the February 2020 meeting:

Yes - 15
No - 1
Abstain - 8

This proposal passed another no-no vote (updating the normative text) at the February 2020 meeting:

Yes - 23
No - 0
Abstain - 1

@wesbland wesbland added the passed first vote Passed the first formal vote label Feb 21, 2020
@wesbland
Copy link
Member

This proposal passed a first vote at the February 2020 meeting:

Yes - 24
No - 0
Abstain - 0

@wesbland
Copy link
Member

This passed a second vote on 2020-06-30.

https://www.mpi-forum.org/meetings/2020/06/votes

@wesbland wesbland added the passed final vote Passed the final formal vote label Jul 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
had reading Completed the formal proposal reading passed final vote Passed the final formal vote passed first vote Passed the first formal vote scheduled reading Reading is scheduled for the next meeting wg-p2p Point-to-Point Working Group
Projects
None yet
Development

No branches or pull requests

7 participants