Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--cpu-list does not bind in master nor 4.0 branch #6540

Closed
awlauria opened this issue Mar 28, 2019 · 6 comments
Closed

--cpu-list does not bind in master nor 4.0 branch #6540

awlauria opened this issue Mar 28, 2019 · 6 comments

Comments

@awlauria
Copy link
Contributor

awlauria commented Mar 28, 2019

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

master/4.0

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Git clone.

Please describe the system on which you are running

  • Operating system/version: RHEL 7.6
  • Computer hardware: Power9
  • Network type:

Details of the problem

This can be reproduced with the following command, it seems that the --cpu-list option isn't getting processed correctly/ignored when it comes to binding:

[awlauria@c685f10n17 master]$ mpirun --report-bindings --np 2 --cpu-list 3,4 --bind-to core hostname
c685f10n17
[c685f10n17:102892] MCW rank 0 is not bound (or bound to all available processors)
[c685f10n17:102892] MCW rank 1 is not bound (or bound to all available processors)
c685f10n17

Changing --bind-to core to hwthread or removing it completely yields the same results.

Another somewhat related issue is that --cpu-list cannot be used with --map-by. Is this intended behavior?

[awlauria@c685f10n17 master]$ mpirun -np 2 --map-by ppr:2:node --cpu-list 2,3 --bind-to core --report-bindings --tag-output hostname
--------------------------------------------------------------------------
Conflicting directives for mapping policy are causing the policy
to be redefined:

  New policy:   RANK_FILE
  Prior policy:  UNKNOWN

Please check that only one policy is defined.
--------------------------------------------------------------------------

Thanks.

@gpaulsen
Copy link
Member

@markalle has a PR to address this, that he'll be posting.

@awlauria
Copy link
Contributor Author

@markalle opened #6584 to address this.

@awlauria
Copy link
Contributor Author

It looks like it's still an issue on v4.0.6:

[awlauria@f8n02 v4.0.x_5_21]$ ./exports/bin/mpirun --report-bindings --np 2 --cpu-list 3,4 --bind-to core hostname
f8n02
[f8n02:1693364] MCW rank 0 is not bound (or bound to all available processors)
[f8n02:1693364] MCW rank 1 is not bound (or bound to all available processors)
f8n02

@awlauria
Copy link
Contributor Author

awlauria commented Jun 29, 2021

#6584 doesn't look like it was ever brought back to the v4 series by my eye - which is probably why I can still reproduce.

This seems to work with v5.0.x/master + prrte with a slightly different run line

**[awlauria@f8n02 v5.0.x]$ ./exports/bin/mpirun --report-bindings --np 2 --cpu-list 3,4 --bind-to core hostname
--------------------------------------------------------------------------
WARNING: A deprecated command line option was used.

  Deprecated option:   --report-bindings
  Corrected option:    --display bind

We have updated this for you and will proceed. However, this will be treated
as an error in a future release. Please update your command line.
--------------------------------------------------------------------------

--------------------------------------------------------------------------
WARNING: A deprecated command line option was used.

  Deprecated option:   --cpu-list
  Corrected option:    --map-by :PE-LIST=3,4

We have updated this for you and will proceed. However, this will be treated
as an error in a future release. Please update your command line.
--------------------------------------------------------------------------


******* Corrected cmd line: ./exports/bin/mpirun --display bind --np 2 --map-by :PE-LIST=3,4 --bind-to core hostname


[f8n02:1693833] MCW rank 1 bound to package[0][core:4]
[f8n02:1693833] MCW rank 0 bound to package[0][core:3]
f8n02
f8n02
**

@awlauria
Copy link
Contributor Author

@markalle @gpaulsen fyi ^^

@awlauria
Copy link
Contributor Author

Closing as fixed in v5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants