-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ompi/schizo: Expose "--mca" when parsing command line. #1311
Conversation
Thanks @awlauria! That looks good to me, but based on the previous exchanges on that topic, I do not think I am qualified to review this PR. |
Hmmm...no, that's not quite correct. This will only handle the generic "mca" option. It should check for the "--omca" option as well. We have already converted the PRRTE and PMIx generic options prior to passing the cmd line to the schizo component, so those should be okay. I'm not convinced this is the correct fix, but it might be part of it. The looping described by @ggouaillardet is simply incorrect and may be contributing to the problem. I'd prefer to wait until that gets unraveled so we understand how this all fits together. Otherwise, we may chase our tails on this one. |
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
8b700eb
to
22d23c3
Compare
Thanks, added --omca. At worst this makes the |
Okay - we can deal with any issues as they arise. We need to resolve this loop logic issue prior to release, but it might be another couple of weeks before I can get to it. |
FWIW: here is what should be happening. PRRTE should call "setup_application" with a directive that we only harvest envars, and specifying the programming model ("ompi" or whatever). This is done by the user-facing launch tool (e.g., We then process that using the command line, altering the env array as required. This is the env that is included in the When PRRTE goes to launch, it again calls "setup_application", but this time it passes in the job map and directives to assign fabric values like endpts. This is done by the DVM master, and it specifically does not ask to harvest envars as the DVM master is not guaranteed local to the user. This information is added to the launch message sent to the backend daemons. We need to review the current logic to ensure this gets done correctly when the launch tool and DVM master are one and the same (i.e., |
Yeah, something is wrong with It is therefore possible that we could get an answer back from the remote end that contains the data for the first request, but not for the second. This would cause the second request to return a "not found" error. I'm not sure if that is what is happening here, but it is a "hole" in the operation. We'll need to investigate it, but that should be as a separate issue. For now, I'd recommend disabling it. |
@rhc54 is your latest comment related to this issue? |
No - I was commenting about the fact that the |
@rhc54 thanks for the explanation. I agree that the different mca's shouldn't pollute different environments, but I don't think it is a blocker for release ompi. Maybe @janjust and @gpaulsen feel differently though. Re the modex test - it is now moved to the last test run. So there's nothing really gained by disabling it - but feel free to do so if it bothers. There is something going on there still unfortunately. :( |
I cannot speak for OMPI's release schedule, but PRRTE won't release until we get the sequence correct as it impacts more than just MCA params and a broader scope than just OMPI. We need to get it reviewed and fixed.
Yeah, I just don't want people wasting time retesting it every time it fails. We know we have a problem, and it is very unlikely that someone's PR is affecting it. We have an issue for it, though I think by now we are painfully aware enough that the issue is hardly required to remind us. |
Signed-off-by: Austen Lauria awlauria@us.ibm.com