Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/MM: Add user namespace handling for posix and cma transports #9213

Merged
merged 2 commits into from
Aug 25, 2023

Conversation

tvegas1
Copy link
Contributor

@tvegas1 tvegas1 commented Jul 13, 2023

What

Add implicit user namespace reachability checks for posix and cma transports to allow proper transport selection. Failing to detect transport unreachability leads to error, where we should instead fallback on other transport.

Issue on posix transport:

mm_posix.c:194  UCX  ERROR open(file_name=/proc/4382/fd/23 flags=0x0) failed: No such file or directory

Issue on cma transport:

cma_ep.c:84   process_vm_readv(pid=75643 {0x2552940,12480}-->{0x1be29e0,12480}) returned -1: Operation not permitted

Why ?

The cma transport needs ptrace capability and the posix transport, when using /proc/<pid>/fd/, also requires that capability. In short the ptrace capability is local to the current user namespace (and it's children user namespaces).

UCX current reachability checks for relevant intra-node transports:

  • sysv: IPC namespace matching
  • posix: IPC namespace and PID namespace matching
  • cma: IPC namespace and PID namespace matching

When two UCX instances are executed in different:

  • IPC namespaces: rc can be used as a fallback for instance, if available
  • PID namespaces: we detect the posix/cma unreachability and fallback properly on sysv transport.
  • USER namespaces: we fail to detect unreachability, try to use cma/posix and terminate on error.

posix transport has two modes:

  • use proc link (default): needs ptrace to use /proc/<pid>/fd/: it avoids potential lefovers
  • else: standard shm, could have leftovers on some cases

The posix non-proc link mode does not need ptrace, so it can work even in different user namespaces.

How ?

Reachability check needs to actually try to establish the connection.

Proposed handling when user namespaces are different:

  • posix: don't try to fallback on non-proc-link, declare unrechability to fallback on sys
    • simpler as the info might not be available in time, sysv is very close, unrelated md alloc can still take place on posix
    • it remains possible to disable use_proc_link if posix is really needed
  • cma: use dummy cma read to detect reachability to exercise capability
    • in case one of the two is a user namespace parent, reachability can return true and false
      • could lead to ep reconfiguration error
      • not really practical case

Tests

Using UCX_TLS=posix,cma, both lanes must be usable:

  • Manually tested starting from the same non default user namespace
  • Manually tested from default user namespace

Adding user namespaces Azure tests below.

With same non-default namespaces:

  • check that each transport reports reachability

With different user namespaces:

  • posix transport:
    • non-proc link: use posix
    • proc link: fallback on sysv
  • cma transport:
    • fallback on sysv

With different pid namespaces:

  • posix/cma transport: make sure to fallback on sysv

Test output:

==== Running perftest namespace positive tests ====
==== Running perftest PID namespace test for posix ====
==== Running perftest PID namespace test for cma ====
==== Running perftest USER namespace test for posix ====
==== Running perftest USER namespace test for cma ====
==== Running perftest USER namespace test for posix non proc link ====

@hoopoepg
Copy link
Contributor

why not just use UCX_TLS=shm,xxxxx transport instead of posix/cma for shared memory based TL's?

@tvegas1
Copy link
Contributor Author

tvegas1 commented Jul 13, 2023

why not just use UCX_TLS=shm,xxxxx transport instead of posix/cma for shared memory based TL's?

for manual testing using UCX_TLS=posix,cma allows direct confirmation that both lanes are properly setup if namespace is ok.

for testing in shell, using more relaxed sysv specifically with each transport in turn tests the transport unreachability + sysv selection for each transport (error will be specific) and success is expected. Expecting success seems better than expecting failure, since failure could be something else happening before is reachable call.

@panda1100
Copy link
Contributor

Hi @tvegas1 -san,

I confirmed this PR works with Apptainer. I would like to know what transport was actually used (ie. posix, sysv, etc).
Is there any way to know what transport was actually used?
I tried with UCX_LOG_LEVEL=debug but that generates lots of lines and hard to identify.

This is how I run apptainer with your PR (our test environment has OmniPath).

mpirun -np 64 -mca pml ucx \
--mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=hfi1_0:1 \
-x UCX_TLS=posix,cma,ib \
apptainer run benchmark.sif

@hoopoepg
Copy link
Contributor

try UCX_LOG_LEVEL=info

@panda1100
Copy link
Contributor

Thank you @hoopoepg -san!

It looks like falling back to rc_verbs/hfi1_0:1

UCX_LOG_LEVEL=info \
mpirun -np 2 -mca pml ucx \
--mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=hfi1_0:1 \
-x UCX_TLS=posix,cma,ib \
apptainer run benchmark.sif
[1689256535.650282] [c6:153046:0]     ucp_context.c:2123 UCX  INFO  Version 1.16.0 (loaded from /opt/ucx/lib/libucp.so.0)
[1689256535.650285] [c6:153047:0]     ucp_context.c:2123 UCX  INFO  Version 1.16.0 (loaded from /opt/ucx/lib/libucp.so.0)
[1689256535.695060] [c6:153047:0]          parser.c:2044 UCX  INFO  UCX_* env variables: UCX_TLS=posix,cma,ib UCX_LOG_LEVEL=info UCX_NET_DEVICES=hfi1_0:1
[1689256535.695100] [c6:153046:0]          parser.c:2044 UCX  INFO  UCX_* env variables: UCX_TLS=posix,cma,ib UCX_LOG_LEVEL=info UCX_NET_DEVICES=hfi1_0:1
[1689256535.696364] [c6:153046:0]     ucp_context.c:2123 UCX  INFO  Version 1.16.0 (loaded from /opt/ucx/lib/libucp.so.0)
[1689256535.696385] [c6:153047:0]     ucp_context.c:2123 UCX  INFO  Version 1.16.0 (loaded from /opt/ucx/lib/libucp.so.0)
[1689256535.708003] [c6:153046:0]      ucp_worker.c:1871 UCX  INFO    0x1957550 self cfg#0 tag(posix/memory cma/memory rc_verbs/hfi1_0:1)
[1689256535.708005] [c6:153047:0]      ucp_worker.c:1871 UCX  INFO    0x1d79550 self cfg#0 tag(posix/memory cma/memory rc_verbs/hfi1_0:1)
[1689256535.709932] [c6:153047:0]      ucp_worker.c:1871 UCX  INFO    0x1d79550 intra-node cfg#1 tag(rc_verbs/hfi1_0:1)
[1689256535.709929] [c6:153046:0]      ucp_worker.c:1871 UCX  INFO    0x1957550 intra-node cfg#1 tag(rc_verbs/hfi1_0:1)

@hoopoepg
Copy link
Contributor

hmmm, as I can see it uses all tree transports (shm + ib) for on-host communications:

[1689256535.708003] [c6:153046:0]      ucp_worker.c:1871 UCX  INFO    0x1957550 self cfg#0 tag(posix/memory cma/memory rc_verbs/hfi1_0:1)
[1689256535.708005] [c6:153047:0]      ucp_worker.c:1871 UCX  INFO    0x1d79550 self cfg#0 tag(posix/memory cma/memory rc_verbs/hfi1_0:1)

@panda1100
Copy link
Contributor

@hoopoepg -san,

What is the difference between self and intra-node??

This is when we use user namespace. (apptainer)

[1689256535.708005] [c6:153047:0]      ucp_worker.c:1871 UCX  INFO    0x1d79550 self cfg#0 tag(posix/memory cma/memory rc_verbs/hfi1_0:1)
[1689256535.709932] [c6:153047:0]      ucp_worker.c:1871 UCX  INFO    0x1d79550 intra-node cfg#1 tag(rc_verbs/hfi1_0:1)

This is when we did not use user namespace. (apptainer-suid)

[1689257918.224606] [c7:2950106:0]      ucp_worker.c:1871 UCX  INFO    0x26b4960 self cfg#0 tag(posix/memory cma/memory)
[1689257918.224604] [c7:2950273:0]      ucp_worker.c:1871 UCX  INFO    0x268c980 intra-node cfg#1 tag(posix/memory cma/memory)

I used the same command for both cases.

UCX_LOG_LEVEL=info \
mpirun -np 64 -mca pml ucx \
--mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=hfi1_0:1 \
-x UCX_TLS=posix,cma,ib \
apptainer run benchmark.sif

@tvegas1
Copy link
Contributor Author

tvegas1 commented Jul 13, 2023

@hoopoepg -san,

What is the difference between self and intra-node??

This is when we use user namespace. (apptainer)

[1689256535.708005] [c6:153047:0]      ucp_worker.c:1871 UCX  INFO    0x1d79550 self cfg#0 tag(posix/memory cma/memory rc_verbs/hfi1_0:1)
[1689256535.709932] [c6:153047:0]      ucp_worker.c:1871 UCX  INFO    0x1d79550 intra-node cfg#1 tag(rc_verbs/hfi1_0:1)

This is when we did not use user namespace. (apptainer-suid)

[1689257918.224606] [c7:2950106:0]      ucp_worker.c:1871 UCX  INFO    0x26b4960 self cfg#0 tag(posix/memory cma/memory)
[1689257918.224604] [c7:2950273:0]      ucp_worker.c:1871 UCX  INFO    0x268c980 intra-node cfg#1 tag(posix/memory cma/memory)

I used the same command for both cases.

UCX_LOG_LEVEL=info \
mpirun -np 64 -mca pml ucx \
--mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=hfi1_0:1 \
-x UCX_TLS=posix,cma,ib \
apptainer run benchmark.sif

seems it works as intended, self is same process as per my understanding, so same namespaces, intra-node is different processes, when unsharing user namespace, we have to use non posix/cma, meaning ib.

If you added sysv, i guess sysv would be used (unless ipc namespace is also unshared).

@panda1100
Copy link
Contributor

panda1100 commented Jul 13, 2023

Thank you @tvegas1 -san :)

user namespace (apptainer) & added sysv

UCX_LOG_LEVEL=info \
mpirun -np 64 -mca pml ucx \
--mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=hfi1_0:1 \
-x UCX_TLS=posix,sysv,cma,ib \
apptainer run benchmark.sif

This is great!!

[1689259142.663742] [c6:156475:0]      ucp_worker.c:1871 UCX  INFO    0x1b7ace0 self cfg#0 tag(sysv/memory cma/memory)
[1689259142.663819] [c6:156595:0]      ucp_worker.c:1871 UCX  INFO    0x1a51d10 intra-node cfg#1 tag(sysv/memory rc_verbs/hfi1_0:1)

I'm using Intel MPI Benchmark (IMB-MPI1 Allreduce) for this test.

@panda1100
Copy link
Contributor

panda1100 commented Jul 13, 2023

@tvegas1 -san,

without user namespace (apptainer-suid), I added sysv after posix

UCX_LOG_LEVEL=info \
mpirun -np 64 -mca pml ucx \
--mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=hfi1_0:1 \
-x UCX_TLS=posix,sysv,cma,ib \
apptainer run benchmark.sif

and then, even if user namespace is not used, trasport fallback to sysv,
I expected posix. Is this intended behavior??

[1689264496.026032] [c7:2972980:0]      ucp_worker.c:1871 UCX  INFO    0x264c9c0 self cfg#0 tag(sysv/memory cma/memory)
[1689264496.026215] [c7:2973095:0]      ucp_worker.c:1871 UCX  INFO    0xd55980 intra-node cfg#1 tag(sysv/memory cma/memory)

when I didn't add sysv (UCX_TLS=posix,cma,ib) (without user namespace),
transport looks like the following

[1689264366.722084] [c7:2966098:0]      ucp_worker.c:1871 UCX  INFO    0x24ea9a0 self cfg#0 tag(posix/memory cma/memory)
[1689264366.722223] [c7:2966018:0]      ucp_worker.c:1871 UCX  INFO    0x8719a0 intra-node cfg#1 tag(posix/memory cma/memory)

@tvegas1
Copy link
Contributor Author

tvegas1 commented Jul 13, 2023

without user namespace (apptainer-suid), I added sysv after posix

i see in the code that the order for UCX_TLS is not meaningful and i think it might be possible that sysv is actually simply always preferred in your case, you could confirm that this behavior is the same with/without this PR (without using user namespaces).

@panda1100
Copy link
Contributor

panda1100 commented Jul 13, 2023

without PR, without user namespace, UCX_TLS=posix,sysv,cma,ib (UCX v1.10.1)

UCX_LOG_LEVEL=info \
mpirun -np 64 -mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=hfi1_0:1 \
-x UCX_TLS=posix,sysv,cma,ib apptainer run benchmark.sif

I got this output

[1689266588.540804] [c8:338481:0]     ucp_worker.c:1720 UCX  INFO  ep_cfg[0]: tag(posix/memory cma/memory rc_verbs/hfi1_0:1); 

@tvegas1
Copy link
Contributor Author

tvegas1 commented Jul 13, 2023

tried v1.10.0 stock, no patch, with mpirun on perftest and with UCX_TLS=posix,sysv, I see sysv being selected, it would be interesting if you would be able to try the PR on its current base/version

@panda1100
Copy link
Contributor

panda1100 commented Jul 13, 2023

Thank you and good morning @tvegas1 -san,

I'll test your PR on v1.10.1 and will update here.

(note)
I share my setup just to be sure. If you have any recommendation for configuration, please let me know.

I used UCX v1.10.1 from GitHub for previous test (without your PR).

git clone https://github.com/openucx/ucx.git ucx
cd ucx
git checkout v1.10.1
./autogen.sh
mkdir build
cd build
../configure --prefix=/home/ciq/ysenda/opt/ucx
make -j $(nproc)
make install

This is configuration log for UCX

configure: =========================================================
configure: UCX build configuration:
configure:       Build prefix:   /home/ciq/ysenda/opt/ucx
configure: Preprocessor flags:   -DCPU_FLAGS="" -I${abs_top_srcdir}/src -I${abs_top_builddir} -I${abs_top_builddir}/src
configure:         C compiler:   gcc -O3 -g -Wall -Werror -funwind-tables -Wno-missing-field-initializers -Wno-unused-parameter -Wno-unused-label -Wno-long-long -Wno-endif-labels -Wno-sign-compare -Wno-multichar -Wno-deprecated-declarations -Winvalid-pch -Wno-pointer-sign -Werror-implicit-function-declaration -Wno-format-zero-length -Wnested-externs -Wshadow -Werror=declaration-after-statement
configure:       C++ compiler:   g++ -O3 -g -Wall -Werror -funwind-tables -Wno-missing-field-initializers -Wno-unused-parameter -Wno-unused-label -Wno-long-long -Wno-endif-labels -Wno-sign-compare -Wno-multichar -Wno-deprecated-declarations -Winvalid-pch
configure:       Multi-thread:   disabled
configure:          MPI tests:   disabled
configure:      Devel headers:   no
configure:           Bindings:   < >
configure:        UCT modules:   < ib rdmacm cma >
configure:       CUDA modules:   < >
configure:       ROCM modules:   < >
configure:         IB modules:   < >
configure:        UCM modules:   < >
configure:       Perf modules:   < >
configure: =========================================================

and this is configuration log for OpenMPI

Transports
-----------------------
Cisco usNIC: no
Cray uGNI (Gemini/Aries): no
Intel Omnipath (PSM2): yes
Intel TrueScale (PSM): no
Mellanox MXM: no
Open UCX: yes
OpenFabrics OFI Libfabric: no
OpenFabrics Verbs: yes
Portals4: no
Shared memory/copy in+copy out: yes
Shared memory/Linux CMA: yes
Shared memory/Linux KNEM: no
Shared memory/XPMEM: no
TCP: yes

@panda1100
Copy link
Contributor

panda1100 commented Jul 14, 2023

Hi @tvegas1 -san,

I applied your PR to v1.10.1.

wget https://patch-diff.githubusercontent.com/raw/openucx/ucx/pull/9213.patch
git clone https://github.com/openucx/ucx.git ucx
cd ucx
git checkout v1.10.1
#APPLY PATCH HERE
patch -p 1 < ../9213.patch
./autogen.sh
mkdir build
cd build
../configure --prefix=/home/ciq/ysenda-1.10.1-patch/opt/ucx
make -j $(nproc)
make install

I got one failure on contrib/test_jenkins.sh but just ignored since it is not related to this test.

[ciq@admin1 ucx]$ patch -p 1 < ../9213.patch 
patching file config/m4/sysdep.m4
Hunk #1 succeeded at 279 (offset 11 lines).
patching file src/uct/sm/scopy/cma/cma_iface.c
Hunk #2 succeeded at 69 (offset -2 lines).
Hunk #3 succeeded at 91 (offset -2 lines).
patching file src/uct/sm/mm/posix/mm_posix.c
Hunk #1 succeeded at 30 (offset -3 lines).
Hunk #2 succeeded at 58 with fuzz 1 (offset -4 lines).
Hunk #3 succeeded at 96 (offset -25 lines).
Hunk #4 succeeded at 367 (offset -39 lines).
Hunk #5 succeeded at 458 (offset -39 lines).
Hunk #6 succeeded at 489 (offset -39 lines).
Hunk #7 succeeded at 603 (offset -39 lines).
patching file contrib/test_jenkins.sh
Hunk #1 succeeded at 1281 with fuzz 1 (offset 267 lines).
Hunk #2 FAILED at 1366.
1 out of 2 hunks FAILED -- saving rejects to file contrib/test_jenkins.sh.rej

test results

UCX v1.10.1 + 9213.patch / apptainer without user namespace / UCX_TLS=posix,cma,ib

[1689295751.617820] [c7:3245078:0]     ucp_worker.c:1720 UCX  INFO  \
ep_cfg[0]: tag(posix/memory cma/memory rc_verbs/hfi1_0:1); 

UCX v1.10.1 + 9213.patch / apptainer without user namespace / UCX_TLS=posix,sysv,cma,ib

[1689295781.909601] [c7:3247723:1]     ucp_worker.c:1720 UCX  INFO  \
ep_cfg[1]: tag(posix/memory cma/memory rc_verbs/hfi1_0:1); 

for reference, UCX v1.10.1 w/o patch / apptainer without user namespace / UCX_TLS=posix,sysv,cma,ib

[1689296484.229063] [c7:3255315:1]     ucp_worker.c:1720 UCX  INFO  \
ep_cfg[1]: tag(posix/memory cma/memory rc_verbs/hfi1_0:1); 

for reference, your PR / apptainer without user namespace / UCX_TLS=posix,sysv,cma,ib

[1689296353.877822] [c7:3251829:0]      ucp_worker.c:1871 UCX  INFO    \
0x1cd79c0 self cfg#0 tag(sysv/memory cma/memory)
[1689296353.877825] [c7:3251777:0]      ucp_worker.c:1871 UCX  INFO    \
0x87b9f0 intra-node cfg#1 tag(sysv/memory cma/memory)

for reference, master / apptainer without user namespace / UCX_TLS=posix,sysv,cma,ib

[1689298209.859693] [c7:3503741:0]      ucp_worker.c:1871 UCX  INFO    \
0x175c760 self cfg#0 tag(sysv/memory cma/memory)
[1689298209.859731] [c7:3503645:0]      ucp_worker.c:1871 UCX  INFO    \
0x123b9c0 intra-node cfg#1 tag(sysv/memory cma/memory)

@panda1100
Copy link
Contributor

panda1100 commented Jul 14, 2023

I can not run benchmark under following conditions (while your PR does work):

UCX v1.10.1 + 9213.patch / apptainer with user namespace / UCX_TLS=posix,sysv,cma,ib

UCX_LOG_LEVEL=info \
mpirun -np 64 -mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=hfi1_0:1 \
-x UCX_TLS=posix,sysv,cma,ib apptainer run benchmark.sif
[1689308725.299938] [c6:182872:0]     ucp_worker.c:1720 UCX  INFO  ep_cfg[0]: tag(posix/memory cma/memory rc_verbs/hfi1_0:1); 
[1689308725.299936] [c6:182871:0]     ucp_worker.c:1720 UCX  INFO  ep_cfg[0]: tag(posix/memory cma/memory rc_verbs/hfi1_0:1); 
[1689308725.306641] [c6:182872:0]       mm_posix.c:195  UCX  ERROR open(file_name=/proc/182871/fd/20 flags=0x0) failed: Permission denied
[1689308725.306641] [c6:182871:0]       mm_posix.c:195  UCX  ERROR open(file_name=/proc/182872/fd/20 flags=0x0) failed: Permission denied
[1689308725.306663] [c6:182871:0]          mm_ep.c:155  UCX  ERROR mm ep failed to connect to remote FIFO id 0xc00000050002ca58: Shared memory error
[1689308725.306663] [c6:182872:0]          mm_ep.c:155  UCX  ERROR mm ep failed to connect to remote FIFO id 0xc00000050002ca57: Shared memory error
[c6:182872] ../../../../../ompi/mca/pml/ucx/pml_ucx.c:424  Error: ucp_ep_create(proc=1) failed: Shared memory error
[c6:182871] ../../../../../ompi/mca/pml/ucx/pml_ucx.c:424  Error: ucp_ep_create(proc=0) failed: Shared memory error
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
[c6:182872] *** An error occurred in MPI_Init
[c6:182872] *** reported by process [867106817,0]
[c6:182872] *** on a NULL communicator
[c6:182872] *** Unknown error
[c6:182872] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[c6:182872] ***    and potentially your MPI job)
[c6:182787] PMIX ERROR: UNREACHABLE in file ../../../../../../../opal/mca/pmix/pmix3x/pmix/src/server/pmix_server.c at line 2198
[c6:182787] 1 more process has sent help message help-mpi-runtime.txt / mpi_init:startup:internal-failure
[c6:182787] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[c6:182787] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal unknown handle

UCX v1.10.1 + 9213.patch / apptainer with user namespace / UCX_TLS=sysv,cma,ib

UCX_LOG_LEVEL=info \
mpirun -np 64 -mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=hfi1_0:1 \
-x UCX_TLS=sysv,cma,ib apptainer run benchmark.sif
[1689308427.139331] [c6:179053:0]     ucp_worker.c:1720 UCX  INFO  ep_cfg[0]: tag(sysv/memory cma/memory rc_verbs/hfi1_0:1); 
#---------------------------------------------------
# Benchmarking PingPong 
# #processes = 2 
# ( 62 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         1.19         0.00
            1         1000         1.18         0.84
            2         1000         1.19         1.68
            4         1000         1.19         3.37
            8         1000         1.19         6.74
           16         1000         1.16        13.75
           32         1000         1.13        28.40
           64         1000         1.11        57.89
          128         1000         1.27       100.99
          256         1000         1.28       200.65
          512         1000         1.51       338.20
         1024         1000         1.90       538.29
         2048         1000         2.35       870.67
         4096         1000         3.17      1292.83
         8192         1000         4.82      1698.26
        16384         1000         7.48      2191.43
        32768         1000        11.81      2774.71
        65536          640        21.49      3049.87
       131072          320        41.27      3176.19
[1689308427.494032] [c6:178639:0]         cma_ep.c:87   UCX  ERROR process_vm_readv(pid=178713 length=262144) returned -1: Operation not permitted
[c6:178639] *** An error occurred in MPI_Recv
[c6:178639] *** reported by process [1445724161,1]
[c6:178639] *** on communicator MPI COMMUNICATOR 3 SPLIT FROM 0
[c6:178639] *** MPI_ERR_INTERN: internal error
[c6:178639] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[c6:178639] ***    and potentially your MPI job)

UCX master + 9213.patch / apptainer with user namespace / UCX_TLS=posix,sysv,cma,ib
UCX master + 9213.patch / apptainer with user namespace / UCX_TLS=sysv,cma,ib

UCX_LOG_LEVEL=info \
mpirun -np 64 -mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=hfi1_0:1 \
-x UCX_TLS=sysv,cma,ib apptainer run --userns benchmark.sif
[1689309100.278744] [c7:3772721:0]      ucp_worker.c:1871 UCX  INFO 0x16f4cc0 \
self cfg#0 tag(sysv/memory cma/memory)
[1689309100.278883] [c7:3772697:0]      ucp_worker.c:1871 UCX  INFO 0x1ddfcd0 \
intra-node cfg#1 tag(sysv/memory cma/memory)
#---------------------------------------------------
# Benchmarking PingPong 
# #processes = 2 
# ( 62 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         1.27         0.00
            1         1000         1.24         0.80
            2         1000         1.28         1.56
            4         1000         1.37         2.91
            8         1000         1.26         6.36
           16         1000         1.24        12.88
           32         1000         1.34        23.84
           64         1000         1.33        48.07
          128         1000         1.45        88.58
          256         1000         1.46       175.00
          512         1000         1.86       274.82
         1024         1000         2.26       453.87
         2048         1000         2.45       837.33
         4096         1000         3.17      1293.24
         8192         1000         4.82      1698.31
        16384         1000         7.47      2192.71
        32768         1000        13.27      2468.44
        65536          640        21.78      3008.68
       131072          320        41.83      3133.23
[1689309100.607533] [c7:3772200:0]          ucp_ep.c:1510 UCX  DIAG  ep 0x7f92601415a8: error 'Connection reset by remote peer' on cma/memory will not be handled since no error callback is installed
[c7:3772200:0:3772200]      cma_ep.c:84   process_vm_readv(pid=3772340 {0x7f9260016010,262144}-->{0x7f7c280be010,262144}) returned -1: Operation not permitted
==== backtrace (tid:3772200) ====
 0  /opt/ucx/lib/libucs.so.0(ucs_handle_error+0x294) [0x7f92533e6834]
 1  /opt/ucx/lib/libucs.so.0(ucs_fatal_error_message+0xb0) [0x7f92533e37e0]
 2  /opt/ucx/lib/libucs.so.0(ucs_log_default_handler+0xf61) [0x7f92533e8441]
 3  /opt/ucx/lib/libucs.so.0(ucs_log_dispatch+0xcc) [0x7f92533e880c]
 4  /opt/ucx/lib/ucx/libuct_cma.so.0(+0x2713) [0x7f9251b36713]
 5  /opt/ucx/lib/ucx/libuct_cma.so.0(uct_cma_ep_tx+0x275) [0x7f9251b36be5]
 6  /opt/ucx/lib/libuct.so.0(uct_scopy_ep_progress_tx+0x63) [0x7f9253643bd3]
 7  /opt/ucx/lib/libucs.so.0(ucs_arbiter_dispatch_nonempty+0xf2) [0x7f92533d7a32]
 8  /opt/ucx/lib/libuct.so.0(uct_scopy_iface_progress+0x59) [0x7f92536435b9]
 9  /opt/ucx/lib/libucs.so.0(+0x24086) [0x7f92533d9086]
10  /opt/ucx/lib/libucs.so.0(+0x245c7) [0x7f92533d95c7]
11  /opt/ucx/lib/libucp.so.0(ucp_worker_progress+0x3a) [0x7f92538d10ba]
12  /opt/ompi/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_recv+0x130) [0x7f9253df7e00]
13  /opt/ompi/lib/libmpi.so.40(MPI_Recv+0x145) [0x7f92643354e5]
14  /opt/mpi-benchmarks/src_c/IMB-MPI1() [0x40b02d]
15  /opt/mpi-benchmarks/src_c/IMB-MPI1() [0x4071b1]
16  /opt/mpi-benchmarks/src_c/IMB-MPI1() [0x402440]
17  /lib64/libc.so.6(__libc_start_main+0xe5) [0x7f9263d00d85]
18  /opt/mpi-benchmarks/src_c/IMB-MPI1() [0x401ebe]
=================================
[c7:3772200] *** Process received signal ***
[c7:3772200] Signal: Aborted (6)
[c7:3772200] Signal code:  (-6)
[c7:3772200] [ 0] /lib64/libpthread.so.0(+0x12cf0)[0x7f926409dcf0]
[c7:3772200] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f9263d14acf]
[c7:3772200] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f9263ce7ea5]
[c7:3772200] [ 3] /opt/ucx/lib/libucs.so.0(+0x2e7e5)[0x7f92533e37e5]
[c7:3772200] [ 4] /opt/ucx/lib/libucs.so.0(ucs_log_default_handler+0xf61)[0x7f92533e8441]
[c7:3772200] [ 5] /opt/ucx/lib/libucs.so.0(ucs_log_dispatch+0xcc)[0x7f92533e880c]
[c7:3772200] [ 6] /opt/ucx/lib/ucx/libuct_cma.so.0(+0x2713)[0x7f9251b36713]
[c7:3772200] [ 7] /opt/ucx/lib/ucx/libuct_cma.so.0(uct_cma_ep_tx+0x275)[0x7f9251b36be5]
[c7:3772200] [ 8] /opt/ucx/lib/libuct.so.0(uct_scopy_ep_progress_tx+0x63)[0x7f9253643bd3]
[c7:3772200] [ 9] /opt/ucx/lib/libucs.so.0(ucs_arbiter_dispatch_nonempty+0xf2)[0x7f92533d7a32]
[c7:3772200] [10] /opt/ucx/lib/libuct.so.0(uct_scopy_iface_progress+0x59)[0x7f92536435b9]
[c7:3772200] [11] /opt/ucx/lib/libucs.so.0(+0x24086)[0x7f92533d9086]
[c7:3772200] [12] /opt/ucx/lib/libucs.so.0(+0x245c7)[0x7f92533d95c7]
[c7:3772200] [13] /opt/ucx/lib/libucp.so.0(ucp_worker_progress+0x3a)[0x7f92538d10ba]
[c7:3772200] [14] /opt/ompi/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_recv+0x130)[0x7f9253df7e00]
[c7:3772200] [15] /opt/ompi/lib/libmpi.so.40(MPI_Recv+0x145)[0x7f92643354e5]
[c7:3772200] [16] /opt/mpi-benchmarks/src_c/IMB-MPI1[0x40b02d]
[c7:3772200] [17] /opt/mpi-benchmarks/src_c/IMB-MPI1[0x4071b1]
[c7:3772200] [18] /opt/mpi-benchmarks/src_c/IMB-MPI1[0x402440]
[c7:3772200] [19] /lib64/libc.so.6(__libc_start_main+0xe5)[0x7f9263d00d85]
[c7:3772200] [20] /opt/mpi-benchmarks/src_c/IMB-MPI1[0x401ebe]
[c7:3772200] *** End of error message ***
/.singularity.d/runscript: line 3: 3772200 Aborted                 (core dumped) /opt/mpi-benchmarks/src_c/IMB-MPI1
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[51686,1],1]
  Exit code:    134
--------------------------------------------------------------------------

I actually thought UCX master + 9213.patch is kind of equivalent to your PR.. Maybe I am missing something.

@tvegas1
Copy link
Contributor Author

tvegas1 commented Jul 14, 2023

Thanks for running tests, will try to setup apptainer on my side but expect delay. for posix, seems it could be ptrace cap missing if you are running without user namespace as i don't see --userns option. for cma part to work, patch should be built with kcmp:

  • ucx_info -b | grep KCMP, you should have kcmp, and the syscall should be working/allowed, you could try to clear the seccomp list to see if it is involved.

@panda1100
Copy link
Contributor

panda1100 commented Jul 14, 2023

Thank you @tvegas1 -san,

Apptainer has two different packages, apptainer-suid and apptainer.

  • apptainer-suid doesn't use "user namespace" (only use "mount namespace" by default)
  • apptainer does use "user namespace" (and "mount namespace" by default)

both package installs apptainer command. The same command name apptainer but one doesn't use "user namespace" but the other use "user namespace. That depends on which package you installed. That's why you don't see --userns option on my log. I have two different nodes, one installed apptainer-suid, the other installed apptainer.

But,, apptainer command comes from apptainer-suid package, it has a switch --userns. So, you can choose either use "user namespace" or not by this switch.

# no user namespace
apptainer run app.sif
# user namespace
apptainer run --userns app.sif

I can share my definition files that I used for my tests, if you need.

To install Apptainer for fresh system:

For Ubuntu 22.04/20.04

sudo add-apt-repository -y ppa:apptainer/ppa
sudo apt install -y apptainer-suid

For RockyLinux 8/9 (EL8/EL9)

sudo dnf install -y epel-release
sudo dnf install -y apptainer-suid

Thank you for your support.

@tvegas1
Copy link
Contributor Author

tvegas1 commented Jul 14, 2023

Thanks, could you please share the definition file?

@panda1100
Copy link
Contributor

for cma part to work, patch should be built with kcmp:
ucx_info -b | grep KCMP, you should have kcmp, and the syscall should be working/allowed, you could try to clear the seccomp list to see if it is involved.

About KCMP, I checked "your PR", "master+9213.patch" and "v1.10.1+9213.patch". and It looks like all versions havs KCMP. I'll check the other things as well.

$ export PATH=~/ysenda/pr-9213/opt/ucx/bin:$PATH
$ ucx_info -b | grep KCMP
#define HAVE_LINUX_KCMP_H         1

$ export PATH=~/ysenda/master-patch/opt/ucx/bin:$PATH
$ ucx_info -b | grep KCMP
#define HAVE_LINUX_KCMP_H         1

$ export PATH=~/ysenda/1.10.1-patch/opt/ucx/bin:$PATH
$ ucx_info -b | grep KCMP
#define HAVE_LINUX_KCMP_H         1

@panda1100
Copy link
Contributor

panda1100 commented Jul 14, 2023

@tvegas1 -san,

Thanks, could you please share the definition file?

Yes.

This is master+9213.patch example. (Our test system has OmniPath.)

Definition file (benchmark.def)

Bootstrap: docker
From: rockylinux/rockylinux:8

%post
    dnf -y install wget git gcc gcc-c++ make file gcc-gfortran bzip2 \
        dnf-plugins-core findutils librdmacm-devel epel-release patch
    dnf -y group install "Development tools"
    crb enable    

    dnf install -y libpsm2 libpsm2-devel numactl-devel
    
    wget https://patch-diff.githubusercontent.com/raw/openucx/ucx/pull/9213.patch
    
    git clone https://github.com/openucx/ucx.git ucx
    cd ucx
    #APPLY PATH HERE
    patch -p 1 ../9213.patch
    ./autogen.sh
    mkdir build
    cd build
    ../configure --prefix=/opt/ucx
    make -j $(nproc)
    make install

    git clone --recurse-submodules -b v4.1.5 https://github.com/open-mpi/ompi.git
    cd ompi
    ./autogen.pl
    mkdir build
    cd build
    ../configure --prefix=/opt/ompi --with-ucx=/opt/ucx
    make -j $(nproc)
    make install
    
    cd /opt
    git clone https://github.com/intel/mpi-benchmarks.git
    cd mpi-benchmarks/src_c
    export PATH=/opt/ompi/bin:$PATH
    make all

%runscript
    /opt/mpi-benchmarks/src_c/IMB-MPI1 "$@"

build image

cd /home/ciq/ysenda-master
apptainer build benchmark.sif benchmark.def

Host side setup example against above definition file

cd /home/ciq/ysenda-master
mkdir -p opt/{ucx,ompi}

cd /home/ciq/ysenda-master
wget https://patch-diff.githubusercontent.com/raw/openucx/ucx/pull/9213.patch
git clone https://github.com/openucx/ucx.git ucx
cd ucx
#APPLY PATCH HERE
patch -p 1 < ../9213.patch
./autogen.sh
mkdir build
cd build
../configure --prefix=/home/ciq/ysenda-master/opt/ucx
make -j $(nproc)
make install

cd /home/ciq/ysenda-master
git clone --recurse-submodules -b v4.1.5 https://github.com/open-mpi/ompi.git
cd ompi
./autogen.pl
mkdir build
cd build
../configure --prefix=/home/ciq/ysenda-master/opt/ompi --with-ucx=/home/ciq/ysenda-master/opt/ucx
make -j $(nproc)
make install

and corresponding environment variables on host side

export PATH=/home/ciq/ysenda-master/opt/ompi/bin:$PATH
export PATH=/home/ciq/ysenda-master/opt/ucx/bin:$PATH

run container (I assume you installed apptainer-suid package for this test.)

w/o user namespace

UCX_LOG_LEVEL=info \
mpirun -np 64 -mca pml ucx \
--mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=hfi1_0:1 \
-x UCX_TLS=posix,cma,ib \
apptainer run benchmark.sif Alltoall 

w/ user namespace

UCX_LOG_LEVEL=info \
mpirun -np 64 -mca pml ucx \
--mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=hfi1_0:1 \
-x UCX_TLS=posix,cma,ib \
apptainer run --userns benchmark.sif Alltoall

@tvegas1
Copy link
Contributor Author

tvegas1 commented Jul 14, 2023

Thanks for spec file, I built ucx only container and tested PR code with two apptainer run running ucx_perftest. When --userns is used the error is gone. Will also test with mpirun+osu to check if it works there but expect delay.

@panda1100
Copy link
Contributor

panda1100 commented Jul 14, 2023

What error did you met without user namespace? Could you please share the output?

The issue we met is due to the following user namespace differences:

mpirun -np 2 apptainer run --userns ... (host namespace #0)
|_ apptainer
| |_ process 1 (user namespace #1)
|_ apptainer
|_ process 2 (user namespace #2)

@tvegas1
Copy link
Contributor Author

tvegas1 commented Jul 14, 2023

specifying posix TL, manually running one client and one server using apptainer run --userns ucx_perftest.sif. posix is selected then open(/proc../fd/.) fails. with PR, other TLs are selected if available or transport availabilty issue is seen. similar to you tree but without mpirun for now.

bash
|- apptainer run --userns ucx_perftest
`- apptainer run --userns ucx_perftest

@yosefe
Copy link
Contributor

yosefe commented Jul 16, 2023

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s), but failed to run 1 pipeline(s).

@panda1100
Copy link
Contributor

@tvegas1 -san,

I found the patch was not properly applied to benchmark.sif last time I tested against master and 1.10.1.
Today, I tested again and confirmed master + 9213.patch, 1.10.1 + 9213.patch works.

The only difference between master + 9213.patch and 1.10.1 + 9213.patch is shared memory transport
for self. master + 9213.patch uses sysv but 1.10.1 + 9213.patch uses posix. Benchmark was running
correctly on both cases though.

master + 9213.patch

UCX_LOG_LEVEL=info \
mpirun -np 64 -mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=hfi1_0:1 \
-x UCX_TLS=posix,sysv,cma,ib \
apptainer run --userns benchmark.sif
[1689561505.346108] [c7:317591:0]      ucp_worker.c:1871 UCX  INFO    \
0xd60d10 self cfg#0 tag(sysv/memory cma/memory)
[1689561505.346213] [c7:317541:0]      ucp_worker.c:1871 UCX  INFO    \
0x1d46e30 intra-node cfg#1 tag(sysv/memory rc_verbs/hfi1_0:1)

1.10.1 + 9213.patch

UCX_LOG_LEVEL=info \
mpirun -np 64 -mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=hfi1_0:1 \
-x UCX_TLS=posix,sysv,cma,ib \
apptainer run --userns benchmark.sif
[1689562917.968402] [c7:566012:0]     ucp_worker.c:1720 UCX  INFO  \
ep_cfg[0]: tag(posix/memory cma/memory rc_verbs/hfi1_0:1); 
[1689562917.969369] [c7:566282:0]     ucp_worker.c:1720 UCX  INFO  \
ep_cfg[1]: tag(sysv/memory rc_verbs/hfi1_0:1); 

@tvegas1
Copy link
Contributor Author

tvegas1 commented Jul 31, 2023

apptainer/apptainer#769

@panda1100
Copy link
Contributor

@tvegas1 apptainer/apptainer#1583
This one also related.

@tvegas1
Copy link
Contributor Author

tvegas1 commented Jul 31, 2023

This one also related.

Thanks for testing and updating with latest status. With your provided spec file I was also able to confirm that the PR allows transport fallback when using apptainer with --userns.

src/uct/sm/mm/posix/mm_posix.c Outdated Show resolved Hide resolved
src/uct/sm/mm/posix/mm_posix.c Outdated Show resolved Hide resolved
src/uct/sm/mm/posix/mm_posix.c Outdated Show resolved Hide resolved
contrib/test_jenkins.sh Outdated Show resolved Hide resolved
@tvegas1 tvegas1 added the WIP-DNM Work in progress / Do not review label Aug 17, 2023
@tvegas1 tvegas1 removed the WIP-DNM Work in progress / Do not review label Aug 18, 2023
@@ -1014,6 +1015,63 @@ test_memtrack() {
UCX_MEMTRACK_DEST=stdout UCX_HANDLE_ERRORS=none UCX_MEMTRACK_LIMIT=412MB ./test/apps/test_memtrack_limit |& grep -C 100 'reached'
}

test_namespace() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it part of test_jenkins.sh and not a separate script?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now moved to test_namespace.sh


echo "==== Running perftest namespace positive tests ===="

export UCX_LOG_LEVEL="debug"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the line (intent was to be verbose in the logs to ease any future issue, by dumping lanes)

cmd="$perftest $args -p $server_port"
step_server_port
$cmd &
sleep 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need ; ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

export UCX_TLS="$tl,sysv"
cmd="$perftest $args -p $server_port"
step_server_port
$cmd &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need to use the local var cmd here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to reuse since we use the same cli two time

export UCX_LOG_LEVEL="debug"

echo "==== Running perftest default and non-default USER namespace test for posix ===="
export UCX_TLS="$tl,sysv"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can do "UCX_TLS=... command" without export

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added UCX_TLS in front of each command


for tl in posix cma; do
echo "==== Running perftest different PID namespace test for $tl ===="
user=$(whoami)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use the var $USER

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

"sudo unshare --pid --fork sudo -u $user UCX_TLS=$tl,sysv UCX_LOG_LEVEL=$UCX_LOG_LEVEL $perftest" "$args" "127.0.0.1" 0 0
done

for tl in posix cma; do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"do" in next line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

echo "==== Running perftest different PID namespace test for $tl ===="
user=$(whoami)
run_client_server_app \
"sudo unshare --pid --fork sudo -u $user UCX_TLS=$tl,sysv UCX_LOG_LEVEL=$UCX_LOG_LEVEL $perftest" "$args" "127.0.0.1" 0 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need sudo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unshare --pid needs privileges

Comment on lines 537 to 538
has_ns = !ucs_sys_ns_is_default(UCS_SYS_NS_TYPE_PID) ||
!ucs_sys_ns_is_default(UCS_SYS_NS_TYPE_USER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really have to pass user NS in the address?
If PID ns is the same, couldn't we try to open the remote file and if we could not open (for any reason) consider the remote iface as not reachable? or use access(2) to check if have permissions to open?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the remote file contains the pid and peer_fd, and it is available doing unpack so it seems it arrives after the is reachable check/lane is setup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, now checking directly the capability to posix open

Comment on lines 74 to 82
static inline int uct_cma_has_vm_perms(pid_t UCS_V_UNUSED peer_pid)
{
#ifdef HAVE_LINUX_KCMP_H
return (syscall(SYS_kcmp, getpid(), peer_pid, KCMP_VM, 0, 0) >= 0) ||
(errno != EPERM);
#else
return 1; /* Try anyways */
#endif
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can try process_vm_readv of some dummy address?
I would assume that if the process is accessible we would wither be successful or get EFAULT. and if it's not accessible it would be EPERM.
This way would not depend on new kernel to check the readability

Copy link
Contributor Author

@tvegas1 tvegas1 Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed (actually opening it seems stronger than checking access, and code is reused)

buildlib/pr/main.yml Outdated Show resolved Hide resolved
contrib/lib.sh Outdated Show resolved Hide resolved
src/uct/sm/scopy/cma/cma_iface.c Outdated Show resolved Hide resolved
src/uct/sm/scopy/cma/cma_iface.c Outdated Show resolved Hide resolved
contrib/test_jenkins.sh Outdated Show resolved Hide resolved
contrib/test_namespace.sh Outdated Show resolved Hide resolved
contrib/test_namespace.sh Outdated Show resolved Hide resolved
buildlib/pr/namespace_tests.yml Outdated Show resolved Hide resolved
buildlib/pr/namespace_tests.yml Outdated Show resolved Hide resolved
contrib/test_namespace.sh Outdated Show resolved Hide resolved
yosefe
yosefe previously approved these changes Aug 24, 2023
Copy link
Contributor

@yosefe yosefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides Mikhail's comments

@yosefe
Copy link
Contributor

yosefe commented Aug 24, 2023

@tvegas1 currently CI looks like this:

Can we place the new tests outside of "Tests" list, call them "Namespace Tests", so it will be the same level as "Wire compat" and "io_demo"?

brminich
brminich previously approved these changes Aug 24, 2023
yosefe
yosefe previously approved these changes Aug 24, 2023
Copy link
Contributor

@yosefe yosefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls squash

@yosefe
Copy link
Contributor

yosefe commented Aug 24, 2023

@tvegas1 seems there is a conflict now, can u pls resolve by merge commit?

@tvegas1 tvegas1 dismissed stale reviews from brminich and yosefe via 2735079 August 24, 2023 11:16
@yosefe yosefe enabled auto-merge August 25, 2023 07:05
@yosefe yosefe merged commit 5f00157 into openucx:master Aug 25, 2023
113 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants