Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out stdout from all ranks except the first? #958

Open
manopapad opened this issue Sep 28, 2024 · 7 comments
Open

Filter out stdout from all ranks except the first? #958

manopapad opened this issue Sep 28, 2024 · 7 comments
Assignees

Comments

@manopapad
Copy link
Contributor

@tylerjereddy is trying out multi-rank Legate runs, and was surprised to see duplicated output. This is inherent to the control replication execution model of Legion1, whereby all ranks execute the top-level program, and "mask out" the work that corresponds to each.

When we were using the Legion Python bindings (prior to Legate 24.09), we were inheriting this code from Legion, which filters out the stdout from all ranks except the first one.

The questions then are:

(a) do we want to replicate this behavior in post-Legion-bindings Legate?
(b) do we make it the default?
(c) do we do the same for C++?

I would ask @tylerjereddy to comment more on why this mode is desirable, and also @lightsighter @elliottslaughter, who originally enabled this mode in Legion.

Footnotes

  1. Unless we introduce a client-server model on top of a control-replicated Legion run, which is a suggestion for making multi-node Jupyter runs work.

@tylerjereddy
Copy link

tylerjereddy commented Sep 28, 2024

I would ask @tylerjereddy to comment more on why this mode is desirable

I'm currently seeing duplicated pytest output on the terminal with multiple ranks, and to make matters worse it hangs for possibly-related or unrelated reasons. Either way, I don't think it should ever be considered acceptable to show the below duplication on the command line when running a testsuite to verify the integrity of a project. It would be an absolute nightmare to debug/make sense of for any large project that decomposes its calculations with legate/cunumeric backing--stacking the task of unwinding/deduplicating the test output on top of whatever the original problem is. And if the original problem is related to concurrency, then the user's mental model is getting polluted by the additional duplicated output as well.

I'm not particularly concerned by whatever changed under the hood, but I am concerned by the information pollution on the command line, and departure from "drop-in replacement" philosophy this leaves for the average Python user that just wants to naivly scale up and possibly print/debug/test their results as they always have.

legate --launcher srun --launcher-extra="-n 6" run_tests.py

============================= test session starts ==============================
platform linux -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0 -- /users/treddy/miniforge3/envs/nvidia_cunumeric_4/bin/python
cachedir: .pytest_cache
rootdir: /lustre/vescratch1/treddy/gitlab_projects/<snip>
configfile: pyproject.toml
============================= test session starts ==============================
platform linux -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0 -- /users/treddy/miniforge3/envs/nvidia_cunumeric_4/bin/python
cachedir: .pytest_cache
rootdir: /lustre/vescratch1/treddy/gitlab_projects/<snip>
configfile: pyproject.toml
============================= test session starts ==============================
platform linux -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0 -- /users/treddy/miniforge3/envs/nvidia_cunumeric_4/bin/python
cachedir: .pytest_cache
rootdir: /lustre/vescratch1/treddy/gitlab_projects/<snip>
configfile: pyproject.toml
============================= test session starts ==============================
platform linux -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0 -- /users/treddy/miniforge3/envs/nvidia_cunumeric_4/bin/python
cachedir: .pytest_cache
rootdir: /lustre/vescratch1/treddy/gitlab_projects/<snip>
configfile: pyproject.toml
============================= test session starts ==============================
platform linux -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0 -- /users/treddy/miniforge3/envs/nvidia_cunumeric_4/bin/python
cachedir: .pytest_cache
rootdir: /lustre/vescratch1/treddy/gitlab_projects/<snip>
configfile: pyproject.toml
============================= test session starts ==============================
platform linux -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0 -- /users/treddy/miniforge3/envs/nvidia_cunumeric_4/bin/python
cachedir: .pytest_cache
rootdir: /lustre/vescratch1/treddy/gitlab_projects/<snip>
configfile: pyproject.toml
xpmem_attach error: : Invalid argument
xpmem_attach error: : Invalid argument
xpmem_attach error: : Invalid argument
xpmem_attach error: : Invalid argument
xpmem_attach error: : Invalid argument

@tylerjereddy
Copy link

Another thing I'm curious about is deprecation policy and behavior changes--the change in print behavior caught me by surprise and seems to be something that happened fairly recently, causing a change in debug print behavior between my legate deployments, preventing me from achieving an apples to apples comparison between one that works and one that does not.

Also, launching an incorrect number of program copies can point to slurm or mpi issues, further muddying the waters.

@Jacobfaib
Copy link
Contributor

Another thing I'm curious about is deprecation policy and behavior changes

So currently, as we move out of limited availability towards general release we are not making any backwards compatibility guarantees or giving deprecation periods.

This lack of deprecation period is temporary though. We plan on implementing proper deprecation periods in the near future as the low-level legate API's mature. But until then, you should consider every release to be breaking.

Also, we keep a changelog every release (https://docs.nvidia.com/legate/latest/api/changes/index.html), which you can consult for any changes (though of course it does not include the stdout changes you highlighted here ;))

@elliottslaughter
Copy link
Contributor

also @lightsighter @elliottslaughter, who originally enabled this mode in Legion.

One thing I just wanted to mention is that I'm pretty sure we added this feature for Legate in the first place: StanfordLegion/legion@b29c1a9

So I don't know the answers to your questions, but this was apparently a feature you wanted at some point in time.

@lightsighter
Copy link
Contributor

I added the feature because I thought it was ideal to maintain the appearance of sequential execution even though there was replicated execution occurring for scalability reasons in the implementation. Four years later it seems to be a reasonably successful decision since nobody complained about it during that time (either Legate or Pygion users) at least as far as I'm aware.

@manopapad
Copy link
Contributor Author

We had an offline discussion with the dev team, they all agreed that getting duplicated output is just noise, but were surprised that we were achieving this by overriding the behavior of the default streams. Quoting from the discussion:

we shouldn't override pythons default print() or stdout, that's just a footgun waiting to happen

Nobody had a problem with e.g. providing a custom legate printer function that does this. The reason why that doesn't work is that we want a solution that:

  • only applies to the top-level task, which is the only part that gets replicated
  • only applies to Python (everyone was in agreement to not do anything to change the default behavior of stdout in C++)
  • covers output from default tools, like pytest, that cannot easily be ported to use a custom print implementation

The best compromise I can see at this moment is to offer a monkeypatching mode like the one from legion_python (override stdout file object to suppress output from all ranks != 0), or alternatively prefix the output with a rank number, but make it opt-in. @tylerjereddy @syamajala does this sound reasonable?

Note that, even with this change, the illusion of a single controller script is still prone to break, e.g. if you try to write files from the top-level, and a separate (opt-in) primitive would be needed for that.

@tylerjereddy
Copy link

@tylerjereddy @syamajala does this sound reasonable?

Some route to preserving the illusion/original experience of sequential execution sounds more useful than prefixing with rank number but retaining the replicated output.

I don't quite follow the advantage of retaining the replicated prints as a default, but that's obviously up to you. It still seems more like a debug mode implementation detail that I'd only want to have opt-in, and even then only if you could generate useful information about decomposition details/division of work, which it seems it doesn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants