-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(title changed) OpenMPI bogus warnings "UCX is unable to handle VM_UNMAP" #3686
Comments
actually, it seems like #3023. |
$> capsh --print|grep ipc
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read I'll check with master |
@zerothi v1.5.x was branched before the fix (on Nov'18, fix was in Jan'18) |
ah, ok. :) |
@yosefe - do we want to back port this to v1.5.2 ? |
@yosefe thanks for following up! I just tried with 1.6.0-rc2. It seems to be resolved. However, now I get: [nicpa-dtu:24568] ../../../../../opal/mca/common/ucx/common_ucx.c:146 Warning: UCX is unable to handle VM_UNMAP event. This may cause performance degradation or data corruption. Pls try adding --mca opal_common_ucx_opal_mem_hooks 1 to mpirun/oshrun command line to resolve this issue.
[nicpa-dtu:24569] ../../../../../opal/mca/common/ucx/common_ucx.c:146 Warning: UCX is unable to handle VM_UNMAP event. This may cause performance degradation or data corruption. Pls try adding --mca opal_common_ucx_opal_mem_hooks 1 to mpirun/oshrun command line to resolve this issue. which could be OMPI, but it seems related to UCX? Even adding the flag proposed I still get the warning. |
I did this: mpirun --mca opal_common_ucx_opal_mem_hooks 1 --mca btl ^uct -np 2 ./test and got the same: [nicpa-dtu:29789] ../../../../../opal/mca/common/ucx/common_ucx.c:146 Warning: UCX is unable to handle VM_UNMAP event. This may cause performance degradation or data corruption.
[nicpa-dtu:29790] ../../../../../opal/mca/common/ucx/common_ucx.c:146 Warning: UCX is unable to handle VM_UNMAP event. This may cause performance degradation or data corruption. |
@yosefe should I move this to the mentioned ticket? |
@zerothi can you try just this: There is also other issue (fixed in UCX master, @hoopoepg will port it to v1.6.x as well) that when you would pass '--mca opal_common_ucx_opal_mem_hooks 1' as recommended by the warning message, the warning still shows |
I get the same output: $> mpirun --mca btl ^uct -np 2 ./test
[nicpa-dtu:20776] ../../../../../opal/mca/common/ucx/common_ucx.c:146 Warning: UCX is unable to handle VM_UNMAP event. This may cause performance degradation or data corruption. Pls try adding --mca opal_common_ucx_opal_mem_hooks 1 to mpirun/oshrun command line to resolve this issue.
[nicpa-dtu:20777] ../../../../../opal/mca/common/ucx/common_ucx.c:146 Warning: UCX is unable to handle VM_UNMAP event. This may cause performance degradation or data corruption. Pls try adding --mca opal_common_ucx_opal_mem_hooks 1 to mpirun/oshrun command line to resolve this issue.
|
You can see my if it helps? |
probably some other OpenMPI component is initializing memory patcher framework, which overrides UCX hooks.. can yous pls try: Also, is it possible to provide config.log for OpenMPI? |
@yosefe same thing :( $> mpirun --mca btl self -np 2 ./test
[nicpa-dtu:21644] ../../../../../opal/mca/common/ucx/common_ucx.c:146 Warning: UCX is unable to handle VM_UNMAP event. This may cause performance degradation or data corruption. Pls try adding --mca opal_common_ucx_opal_mem_hooks 1 to mpirun/oshrun command line to resolve this issue.
[nicpa-dtu:21645] ../../../../../opal/mca/common/ucx/common_ucx.c:146 Warning: UCX is unable to handle VM_UNMAP event. This may cause performance degradation or data corruption. Pls try adding --mca opal_common_ucx_opal_mem_hooks 1 to mpirun/oshrun command line to resolve this issue. I have amended the gist with ompi-config.log as well. |
@zerothi unfortunately, i've tried to reproduce the issue with same versions and configuration on my system, and no luck.. Can you pls rebuild UCX with debug like this (--disable-logging replaced by --enable-logging)? And then run like this: This should produce some logging output to help identify the problem |
I got this: [1560801008.017671] [nicpa-dtu:585] install.c:124 UCX DEBUG mmap test: got 0x0 out of 0x2007f
[1560801008.017669] [nicpa-dtu:586] install.c:124 UCX DEBUG mmap test: got 0x0 out of 0x2007f
[nicpa-dtu:00585] ../../../../../opal/mca/common/ucx/common_ucx.c:146 Warning: UCX is unable to handle VM_UNMAP event. This may cause performance degradation or data corruption. Pls try adding --mca opal_common_ucx_opal_mem_hooks 1 to mpirun/oshrun command line to resolve this issue.
[nicpa-dtu:00586] ../../../../../opal/mca/common/ucx/common_ucx.c:146 Warning: UCX is unable to handle VM_UNMAP event. This may cause performance degradation or data corruption. Pls try adding --mca opal_common_ucx_opal_mem_hooks 1 to mpirun/oshrun command line to resolve this issue.
EDIT: hidden because I had only added $> mpirun -x UCX_MEM_LOG_LEVEL=debug -mca btl self -n 2 ./test
[1560802687.976514] [nicpa-dtu:1701] install.c:124 UCX DEBUG mmap test: got 0x0 out of 0x2007f
[1560802687.976560] [nicpa-dtu:1700] install.c:124 UCX DEBUG mmap test: got 0x0 out of 0x2007f
[nicpa-dtu:01701] ../../../../../opal/mca/common/ucx/common_ucx.c:146 Warning: UCX is unable to handle VM_UNMAP event. This may cause performance degradation or data corruption. Pls try adding --mca opal_common_ucx_opal_mem_hooks 1 to mpirun/oshrun command line to resolve this issue.
[nicpa-dtu:01700] ../../../../../opal/mca/common/ucx/common_ucx.c:146 Warning: UCX is unable to handle VM_UNMAP event. This may cause performance degradation or data corruption. Pls try adding --mca opal_common_ucx_opal_mem_hooks 1 to mpirun/oshrun command line to resolve this issue. Here is the header of my $ /home/nicpa/installation/bash-build/.compile/ucx-1.6.0/contrib/../configure --enable-optimizations --disable-logging --disable-debug --disable-assertions --disable-params-check --enable-logging --enable-optimizations --disable-debug --disable--assertions --disable-param-check --with-rc --with-ud --with-dc --with-dm --with-mcpu --with-march --prefix=/opt/gnu/9.1.0/ucx/1.6.0 |
@yosefe If you need anything more, please do not hesitate to contact me :) |
@zerothi i guess you don't have any IB/RDMA device, or knem driver, right? |
Correct, I don't have anything on my machine. :) |
My guess is that you could actually run CI tests on travis/azure/... for such a non-use case? (just an idea) |
@zerothi FYI, ucx v1.6.0-rc3 contains a fix for this issue. any chance you can give it a try? |
I am running installation! Will return! ;) Thanks |
Now I get: $> mpirun -np 2 ./test
<nothing>
$> mpirun -x UCX_MEM_LOG_LEVEL=debug -mca btl self -n 2 ./test
[1561016808.820435] [nicpa-dtu:8984] install.c:192 UCX DEBUG mmap test: got 0x0 out of 0x0
[1561016808.820435] [nicpa-dtu:8985] install.c:192 UCX DEBUG mmap test: got 0x0 out of 0x0 I guess this means it can be closed! |
I am running:
and OpenMPI 4.0.1 and GCC 9.1.0.
I am running on a local machine, which of course does not make sense with all the options, but configure just disables the extended features.
However, when running the simplest MPI program (only
MPI_init
andMPI_finalize
) I get the following error:$> ipcs -l ------ Messages Limits -------- max queues system wide = 32000 max size of message (bytes) = 8192 default max size of queue (bytes) = 16384 ------ Shared Memory Limits -------- max number of segments = 4096 max seg size (kbytes) = 18014398509465599 max total shared memory (kbytes) = 18014398442373116 min seg size (bytes) = 1 ------ Semaphore Limits -------- max number of arrays = 32000 max semaphores per array = 32000 max semaphores system wide = 1024000000 max ops per semop call = 500 semaphore max value = 32767
It seems unrelated to #3023 and #3668 since this is a local machine.
The text was updated successfully, but these errors were encountered: