Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpirun (master branch) error message for unknown and deprecated argument is wrong #10097

Closed
wzamazon opened this issue Mar 8, 2022 · 12 comments

Comments

@wzamazon
Copy link
Contributor

wzamazon commented Mar 8, 2022

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

master

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

git clone from source. then configured with ./configure --prefix=/fsx/ALinux2/PortaFiducia/libraries/openmpi/master/install --with-hwloc=external --disable-man-pages

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

94169f1725a26a156fd73fca1b61df14ccd581ab 3rd-party/openpmix (v1.1.3-3429-g94169f17)
 6722bd767e0fccbe07193f59f4f13e5610c7dd6a 3rd-party/prrte (psrvr-v2.0.0rc1-4274-g6722bd767e)

Please describe the system on which you are running

  • Operating system/version: Amazon Linux 2
  • Computer hardware: Intel
  • Network type: EFA

Details of the problem

The error messages of mpirun from master branch for deprecated/unknown argument is wrong.

For example:

If I pass a deprecated argument like -machinefile

/fsx/ALinux2/PortaFiducia/libraries/openmpi/master/install/bin/mpirun --map-by ppr:4:node -machinefile hostfile ./t

I got

--------------------------------------------------------------------------
An unrecognized option was included on the (null) command line:

  Option: ppr:4:node

Please use the "(null) --help" command to obtain a list of all
supported options.
--------------------------------------------------------------------------
prterun: command line error (Out of resource)

This is super confusing because "ppr:4:node" is valid input for --map-by. It is -machinefile that is causing trouble. So there seems to be an off-by-1 error in printing the arugment.

On the other hand, If I pass an unknown argument like -i like:

/fsx/ALinux2/PortaFiducia/libraries/openmpi/master/install/bin/mpirun -i

I got

--------------------------------------------------------------------------
An unrecognized option was included on the (null) command line:

  Option: -i

Please use the "(null) --help" command to obtain a list of all
supported options.
--------------------------------------------------------------------------

The error message is more helpful, though the (null) should be fixed.

@awlauria
Copy link
Contributor

awlauria commented Mar 9, 2022

Working on a fix. Should post this morning

@awlauria
Copy link
Contributor

awlauria commented Mar 9, 2022

Fix for unknown argument "(null)":

openpmix/prrte#1257
openpmix/openpmix#2495

@awlauria
Copy link
Contributor

awlauria commented Mar 9, 2022

Regarding the deprecation warning, it appears you are missing a - on -machinefile, which is causing the parser to get a bit wonky.

it seems to work when adding a double-dash.

FWIW it doesn't seem to be in prrte's deprecated list. I don't think it is deprecated.

We could add single-dash machinefile to the list of those converted from single-dash to double dash (or convert all of them..). I feel like prrte used to do that, but it got dropped at some point?

@wzamazon
Copy link
Contributor Author

it appears you are missing a - on -machinefile, which is causing the parser to get a bit wonky.

Ah, I see. Thank you!

However, single dash is accepted by open mpi 4, so I think it should be handled by v5 (or at least print a proper warning).

We could add single-dash machinefile to the list of those converted from single-dash to double dash (or convert all of them..). I feel like prrte used to do that, but it got dropped at some point?

Yes, I remember prrte used to do that.

@awlauria
Copy link
Contributor

No worries, I can try posting that patch, but may not until next week.

The fixes for "(null)" are fixed in pmix/prte master, just need to update the submodule.

@wzamazon
Copy link
Contributor Author

I can confirm the "(null)" has been fixed by latest pmix/prte. Thank you!

awlauria added a commit to awlauria/prrte that referenced this issue Mar 11, 2022
This will preserve backwards compatability with the ompi v4
series with v5 and current master.

Refs open-mpi/ompi#10097

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
awlauria added a commit to awlauria/prrte that referenced this issue Mar 11, 2022
This will preserve backwards compatability with the ompi v4
series with v5 and current master.

First, this restores the silent conversion of all single dash
args to multi-dash arguments.

Second, it will track and eventually print all offending ompi
args in a show-help message.

It does this by appending all single dash arguments
to an array, along with their position found in the arg list.
This is key, since we don't really knoew where the user executable
is on the list, and it isn't easy to tell. Luckily, pmix_arg_parse()
will store where the ompi args stops in results->tail. Therefore
all we need to know is if each offending argument came before results->tail
or not to decide if a warning should be printed for it. This will prevent
prte from erronously warning if the user executable is taking a single dash
option.

Prrte will parse the app arguments again later, so any changes to them
here are thrown away. The executable arguments are not impacted.

Refs open-mpi/ompi#10097

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
@awlauria
Copy link
Contributor

@wzamazon I opened openpmix/prrte#1269 to try and fix the single dash issue if you want to test it.

awlauria added a commit to awlauria/prrte that referenced this issue Mar 11, 2022
This will preserve backwards compatability with the ompi v4
series with v5 and current master.

First, this restores the silent conversion of all single dash
args to multi-dash arguments.

Second, it will track and eventually print all offending ompi
args in a show-help message.

It does this by appending all single dash arguments
to an array, along with their position found in the arg list.
This is key, since we don't really knoew where the user executable
is on the list, and it isn't easy to tell. Luckily, pmix_arg_parse()
will store where the ompi args stops in results->tail. Therefore
all we need to know is if each offending argument came before results->tail
or not to decide if a warning should be printed for it. This will prevent
prte from erronously warning if the user executable is taking a single dash
option.

Prrte will parse the app arguments again later, so any changes to them
here are thrown away. The executable arguments are not impacted.

Refs open-mpi/ompi#10097

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
@wzamazon
Copy link
Contributor Author

I can confirm openpmix/prrte#1269 fixed the issue of single dash

@awlauria
Copy link
Contributor

awlauria commented Mar 18, 2022

This should be fixed in master with #10136. Needs back porting to v5.0.x

@awlauria
Copy link
Contributor

awlauria commented Mar 28, 2022

This should be fixed in the v5.0.x branch via the following submodule pointer update: #10161

for main: #10165 (This PR fixes a variety of issues, but updating the submodule here will fix this issue on main).

awlauria added a commit to awlauria/prrte that referenced this issue Mar 31, 2022
This will preserve backwards compatability with the ompi v4
series with v5 and current master.

First, this restores the silent conversion of all single dash
args to multi-dash arguments.

Second, it will track and eventually print all offending ompi
args in a show-help message.

It does this by appending all single dash arguments
to an array, along with their position found in the arg list.
This is key, since we don't really knoew where the user executable
is on the list, and it isn't easy to tell. Luckily, pmix_arg_parse()
will store where the ompi args stops in results->tail. Therefore
all we need to know is if each offending argument came before results->tail
or not to decide if a warning should be printed for it. This will prevent
prte from erronously warning if the user executable is taking a single dash
option.

Prrte will parse the app arguments again later, so any changes to them
here are thrown away. The executable arguments are not impacted.

Refs open-mpi/ompi#10097

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
(cherry picked from commit 381a521)
awlauria added a commit to awlauria/prrte that referenced this issue Mar 31, 2022
This will preserve backwards compatability with the ompi v4
series with v5 and current master.

First, this restores the silent conversion of all single dash
args to multi-dash arguments.

Second, it will track and eventually print all offending ompi
args in a show-help message.

It does this by appending all single dash arguments
to an array, along with their position found in the arg list.
This is key, since we don't really knoew where the user executable
is on the list, and it isn't easy to tell. Luckily, pmix_arg_parse()
will store where the ompi args stops in results->tail. Therefore
all we need to know is if each offending argument came before results->tail
or not to decide if a warning should be printed for it. This will prevent
prte from erronously warning if the user executable is taking a single dash
option.

Prrte will parse the app arguments again later, so any changes to them
here are thrown away. The executable arguments are not impacted.

Refs open-mpi/ompi#10097

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
(cherry picked from commit 381a521)
@awlauria
Copy link
Contributor

awlauria commented Mar 31, 2022

Well, unfortunately I neglected to bring the fix back to the v2.1 prrte release branch. So my comment above about it being fixed in v5.0.x is incorrect (and thus is not in rc4).

rhc54 pushed a commit to openpmix/prrte that referenced this issue Apr 1, 2022
This will preserve backwards compatability with the ompi v4
series with v5 and current master.

First, this restores the silent conversion of all single dash
args to multi-dash arguments.

Second, it will track and eventually print all offending ompi
args in a show-help message.

It does this by appending all single dash arguments
to an array, along with their position found in the arg list.
This is key, since we don't really knoew where the user executable
is on the list, and it isn't easy to tell. Luckily, pmix_arg_parse()
will store where the ompi args stops in results->tail. Therefore
all we need to know is if each offending argument came before results->tail
or not to decide if a warning should be printed for it. This will prevent
prte from erronously warning if the user executable is taking a single dash
option.

Prrte will parse the app arguments again later, so any changes to them
here are thrown away. The executable arguments are not impacted.

Refs open-mpi/ompi#10097

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
(cherry picked from commit 381a521)
awlauria added a commit to awlauria/ompi that referenced this issue Apr 6, 2022
Changes:

f3828e8307 - Revise show_help to use PMIx IOF
9998bfadad - schizo/ompi: Convert all single dashes to double dashes.
744710d33f - prte: check if dvm actually got set up

Refs:

open-mpi#10097
open-mpi#10157

Note: There are no OpenPMIx changes to pull in at this time (4/06/2022).

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
@awlauria
Copy link
Contributor

awlauria commented Apr 7, 2022

PR #10230 was merged to bring this fix to v5.0.x. This will be in rc5. Closing.

@awlauria awlauria closed this as completed Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants