Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix / revamp mpirun --output-filename #7133

Closed
jsquyres opened this issue Nov 1, 2019 · 4 comments
Closed

Fix / revamp mpirun --output-filename #7133

jsquyres opened this issue Nov 1, 2019 · 4 comments

Comments

@jsquyres
Copy link
Member

jsquyres commented Nov 1, 2019

This issue started with #7095. However, it has grown to encompass several things, so I'm opening a new issue to gather them all into one spot.

The initial issue is that mpirun's --output-filename behavior no longer matches what is described in the mpirun(1) man page. Specifically: it doesn't just output to a single file per process any more; mpirun now creates a directory for each MPI process and outputs a stdout and stderr file in there.

  1. We should probably rename this behavior to be --output-directory.
    • The new behavior needs to be documented: output to DIR/JOBID/rank.N/stdout|stderr
    • Can also be combined with --merge-stderr-to-stdout (right? test this...)
    • Be sure to mention :nojobid (omit the job ID in the directory hierarchy) and :nocopy (don't also emit to stdout/stderr), and that they can be combined into a single comma-delimited list
    • Mention efficiency of :nocopy (i.e., no IOF used to send back to mpirun) -- probably nearly as efficient as the app writing to its own local files...?
  2. BUG FIX: If you specify an invalid suffix (e.g., :noooooocopy, the user is not notified).
  3. Need to think through how to use this feature over time from user's perspective
    • What to do in v2.x
    • What to do in v3.0.x (behavior changed compared to v2.x)
    • What to do in v3.1.x
    • What to do in v4.0.x (?added -output-directory / deprecated --output-filename?)
    • What to do in v5.0.x (?--output-directory only?)
  4. Make sure to mention killing/deprecating --output-file in NEWS
    • ...unless the old --output-file behavior is resurrected / preserved...? That's an option, if someone wants to do it.
@ggouaillardet
Copy link
Contributor

What does nocopy does?
If it only writes to files but not stdout/stderr then we still need to involve IOF since the MPI tasks might be running on nodes not mounting the filesystem mpirun is writing too.
For performance improvements, I think we would need an other option such as noforward and this should probably not be the default behavior.

@jsquyres
Copy link
Member Author

Hey @rhc54: did you fix the issue that a user is not notified if they specify an invalid suffix?

@rhc54
Copy link
Contributor

rhc54 commented Nov 14, 2019

Yes, I did

@jsquyres
Copy link
Member Author

giphy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants