Filter out stdout from all ranks except the first? #958

manopapad · 2024-09-28T00:32:05Z

@tylerjereddy is trying out multi-rank Legate runs, and was surprised to see duplicated output. This is inherent to the control replication execution model of Legion¹, whereby all ranks execute the top-level program, and "mask out" the work that corresponds to each.

When we were using the Legion Python bindings (prior to Legate 24.09), we were inheriting this code from Legion, which filters out the stdout from all ranks except the first one.

The questions then are:

(a) do we want to replicate this behavior in post-Legion-bindings Legate?
(b) do we make it the default?
(c) do we do the same for C++?

I would ask @tylerjereddy to comment more on why this mode is desirable, and also @lightsighter @elliottslaughter, who originally enabled this mode in Legion.

Unless we introduce a client-server model on top of a control-replicated Legion run, which is a suggestion for making multi-node Jupyter runs work. ↩

tylerjereddy · 2024-09-28T00:44:20Z

I would ask @tylerjereddy to comment more on why this mode is desirable

I'm currently seeing duplicated pytest output on the terminal with multiple ranks, and to make matters worse it hangs for possibly-related or unrelated reasons. Either way, I don't think it should ever be considered acceptable to show the below duplication on the command line when running a testsuite to verify the integrity of a project. It would be an absolute nightmare to debug/make sense of for any large project that decomposes its calculations with legate/cunumeric backing--stacking the task of unwinding/deduplicating the test output on top of whatever the original problem is. And if the original problem is related to concurrency, then the user's mental model is getting polluted by the additional duplicated output as well.

I'm not particularly concerned by whatever changed under the hood, but I am concerned by the information pollution on the command line, and departure from "drop-in replacement" philosophy this leaves for the average Python user that just wants to naivly scale up and possibly print/debug/test their results as they always have.

legate --launcher srun --launcher-extra="-n 6" run_tests.py

============================= test session starts ==============================
platform linux -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0 -- /users/treddy/miniforge3/envs/nvidia_cunumeric_4/bin/python
cachedir: .pytest_cache
rootdir: /lustre/vescratch1/treddy/gitlab_projects/<snip>
configfile: pyproject.toml
============================= test session starts ==============================
platform linux -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0 -- /users/treddy/miniforge3/envs/nvidia_cunumeric_4/bin/python
cachedir: .pytest_cache
rootdir: /lustre/vescratch1/treddy/gitlab_projects/<snip>
configfile: pyproject.toml
============================= test session starts ==============================
platform linux -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0 -- /users/treddy/miniforge3/envs/nvidia_cunumeric_4/bin/python
cachedir: .pytest_cache
rootdir: /lustre/vescratch1/treddy/gitlab_projects/<snip>
configfile: pyproject.toml
============================= test session starts ==============================
platform linux -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0 -- /users/treddy/miniforge3/envs/nvidia_cunumeric_4/bin/python
cachedir: .pytest_cache
rootdir: /lustre/vescratch1/treddy/gitlab_projects/<snip>
configfile: pyproject.toml
============================= test session starts ==============================
platform linux -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0 -- /users/treddy/miniforge3/envs/nvidia_cunumeric_4/bin/python
cachedir: .pytest_cache
rootdir: /lustre/vescratch1/treddy/gitlab_projects/<snip>
configfile: pyproject.toml
============================= test session starts ==============================
platform linux -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0 -- /users/treddy/miniforge3/envs/nvidia_cunumeric_4/bin/python
cachedir: .pytest_cache
rootdir: /lustre/vescratch1/treddy/gitlab_projects/<snip>
configfile: pyproject.toml
xpmem_attach error: : Invalid argument
xpmem_attach error: : Invalid argument
xpmem_attach error: : Invalid argument
xpmem_attach error: : Invalid argument
xpmem_attach error: : Invalid argument

tylerjereddy · 2024-09-28T02:11:03Z

Another thing I'm curious about is deprecation policy and behavior changes--the change in print behavior caught me by surprise and seems to be something that happened fairly recently, causing a change in debug print behavior between my legate deployments, preventing me from achieving an apples to apples comparison between one that works and one that does not.

Also, launching an incorrect number of program copies can point to slurm or mpi issues, further muddying the waters.

Jacobfaib · 2024-10-02T17:07:36Z

Another thing I'm curious about is deprecation policy and behavior changes

So currently, as we move out of limited availability towards general release we are not making any backwards compatibility guarantees or giving deprecation periods.

This lack of deprecation period is temporary though. We plan on implementing proper deprecation periods in the near future as the low-level legate API's mature. But until then, you should consider every release to be breaking.

Also, we keep a changelog every release (https://docs.nvidia.com/legate/latest/api/changes/index.html), which you can consult for any changes (though of course it does not include the stdout changes you highlighted here ;))

elliottslaughter · 2024-10-02T17:51:26Z

also @lightsighter @elliottslaughter, who originally enabled this mode in Legion.

One thing I just wanted to mention is that I'm pretty sure we added this feature for Legate in the first place: StanfordLegion/legion@b29c1a9

So I don't know the answers to your questions, but this was apparently a feature you wanted at some point in time.

lightsighter · 2024-10-02T19:10:10Z

I added the feature because I thought it was ideal to maintain the appearance of sequential execution even though there was replicated execution occurring for scalability reasons in the implementation. Four years later it seems to be a reasonably successful decision since nobody complained about it during that time (either Legate or Pygion users) at least as far as I'm aware.

manopapad · 2024-10-02T20:43:30Z

We had an offline discussion with the dev team, they all agreed that getting duplicated output is just noise, but were surprised that we were achieving this by overriding the behavior of the default streams. Quoting from the discussion:

we shouldn't override pythons default print() or stdout, that's just a footgun waiting to happen

Nobody had a problem with e.g. providing a custom legate printer function that does this. The reason why that doesn't work is that we want a solution that:

only applies to the top-level task, which is the only part that gets replicated
only applies to Python (everyone was in agreement to not do anything to change the default behavior of stdout in C++)
covers output from default tools, like pytest, that cannot easily be ported to use a custom print implementation

The best compromise I can see at this moment is to offer a monkeypatching mode like the one from legion_python (override stdout file object to suppress output from all ranks != 0), or alternatively prefix the output with a rank number, but make it opt-in. @tylerjereddy @syamajala does this sound reasonable?

Note that, even with this change, the illusion of a single controller script is still prone to break, e.g. if you try to write files from the top-level, and a separate (opt-in) primitive would be needed for that.

tylerjereddy · 2024-10-04T21:38:49Z

@tylerjereddy @syamajala does this sound reasonable?

Some route to preserving the illusion/original experience of sequential execution sounds more useful than prefixing with rank number but retaining the replicated output.

I don't quite follow the advantage of retaining the replicated prints as a default, but that's obviously up to you. It still seems more like a debug mode implementation detail that I'd only want to have opt-in, and even then only if you could generate useful information about decomposition details/division of work, which it seems it doesn't.

manopapad assigned bryevdv Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter out stdout from all ranks except the first? #958

Filter out stdout from all ranks except the first? #958

manopapad commented Sep 28, 2024

tylerjereddy commented Sep 28, 2024 •

edited

Loading

tylerjereddy commented Sep 28, 2024

Jacobfaib commented Oct 2, 2024

elliottslaughter commented Oct 2, 2024

lightsighter commented Oct 2, 2024

manopapad commented Oct 2, 2024

tylerjereddy commented Oct 4, 2024

Filter out stdout from all ranks except the first? #958

Filter out stdout from all ranks except the first? #958

Comments

manopapad commented Sep 28, 2024

Footnotes

tylerjereddy commented Sep 28, 2024 • edited Loading

tylerjereddy commented Sep 28, 2024

Jacobfaib commented Oct 2, 2024

elliottslaughter commented Oct 2, 2024

lightsighter commented Oct 2, 2024

manopapad commented Oct 2, 2024

tylerjereddy commented Oct 4, 2024

tylerjereddy commented Sep 28, 2024 •

edited

Loading