Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
removing an overly aggressive error check in binding
In bind_generic() there's a loop that picks a starting trg_obj and then walks through a loop of next = trg_obj->next_cousin until it has made total_cpus assignments. But the code doesn't accept that those assignments might not be adjacent objects. Example: % mpirun -np 2 --report-bindings --map-by ppr:2:node:pe=3 \ --cpu-set 4,5,7,8,9,11 -bind-to hwthread:overload-allowed > MCW 0 : [..../BB.B/..../....] > MCW 1 : [..../..../BB.B/....] It will want to assign 3 cpus and will loop through trg_obj 00001 (with ncpus 1) trg_obj 000001 (with ncpus 1) trg_obj 0000001 (with ncpus 0) trg_obj 000000011 (with ncpus 1) The original code on the third entry would see num_bound for the object become too high for its ncpus and think oversubscription was happening. I changed it to only ++num_bound eg to use that object if the object has cpus in its cpuset after intersected with the allowed/available masks. The error message from the original code (if you remove the overload-allowed) would be > A request was made to bind to that would result in binding more > processes than cpus on a resource: > Bind to: HWTHREAD > Node: ... > #processes: 1 > #cpus: 0 Signed-off-by: Mark Allen <markalle@us.ibm.com>
- Loading branch information