You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Testing PSM2 library, I found that deadlock is triggered when calling fee() with an invalid address. This is with OMPI 1.10.1. See stack trace below: It looks like opal_memory_ptmalloc2_free() locks a mutex before calling opal_memory_ptmalloc2_int_free(). Then, segfault occurs, the signal handler is called and we have opal_memory_ptmalloc2_free() called again inside the signal handler that will wait forever on the mutex locked by the first call.
#0 0x00002afb2bc7e99d in nanosleep () from /lib64/libpthread.so.0
#1 0x00002afb2c569005 in opal_memory_ptmalloc2_free ()
from /usr/mpi/gcc/openmpi-1.10.0-hfi/lib64/libopen-pal.so.13
#2 0x00002afb2c4ee87f in opal_class_finalize ()
from /usr/mpi/gcc/openmpi-1.10.0-hfi/lib64/libopen-pal.so.13
#3 0x00002afb2b782b5a in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#4 0x00002afb2bec4e49 in __run_exit_handlers () from /lib64/libc.so.6
#5 0x00002afb2bec4e95 in exit () from /lib64/libc.so.6
#6 0x00002afb320fc2ca in hfi_sighdlr (sig=11, p1=<optimized out>,
ucv=<optimized out>)
at opa_debug.c:190
#7 <signal handler called>
#8 0x00002afb2c568b2a in opal_memory_ptmalloc2_int_free ()
from /usr/mpi/gcc/openmpi-1.10.0-hfi/lib64/libopen-pal.so.13
#9 0x00002afb2c569053 in opal_memory_ptmalloc2_free ()
from /usr/mpi/gcc/openmpi-1.10.0-hfi/lib64/libopen-pal.so.13
#10 0x00002afb320edadc in ips_free_epaddr (epaddr=0x2afb34755540) at
ips_proto_connect.c:634
#11 ips_proto_disconnect (proto=proto@entry=0x1aeb180, force=force@entry=0,
numep=numep@entry=5160,
array_of_epaddr=array_of_epaddr@entry=0x2afb3ea8a290,
array_of_epaddr_mask=array_of_epaddr_mask@entry=0x2afb3ea80130,
array_of_errors=array_of_errors@entry=0x2afb3ea851e0,
timeout_in=timeout_in@entry=52000000000)
at ips_proto_connect.c:1439
#12 0x00002afb320e6e84 in ips_proto_fini (proto=proto@entry=0x1aeb180, force=0,
timeout_in=52000000000)
at ips_proto.c:641
#13 0x00002afb320e001f in ips_ptl_fini (ptl=0x1aeb040, force=<optimized out>,
timeout_in=<optimized out>)
at ptl.c:433
#14 0x00002afb320d36dd in __psm2_ep_close (ep=0x1aeac80, mode=0,
timeout_in=52000000000) at psm_ep.c:1107
#15 0x00002afb31ec0b17 in ompi_mtl_psm2_finalize ()
from /usr/mpi/gcc/openmpi-1.10.0-hfi/lib64/openmpi/mca_mtl_psm2.so
#16 0x00002afb2b9dcd12 in ompi_mpi_finalize () from
/usr/mpi/gcc/openmpi-1.10.0-hfi/lib64/libmpi.so.12
#17 0x0000000000400d42 in main ()
The text was updated successfully, but these errors were encountered:
matcabral
changed the title
opal_memory_ptmalloc2_int_free() hanging when invalid address provided
opal_memory_ptmalloc2_free() hanging when invalid address provided
Jan 19, 2016
Yes, thanks!
by definition _exit() should solve the hang. I haven't reproduced the hang so far. In any case, this is not an OMPI issue, so I'm closing this issue.
Testing PSM2 library, I found that deadlock is triggered when calling fee() with an invalid address. This is with OMPI 1.10.1. See stack trace below: It looks like opal_memory_ptmalloc2_free() locks a mutex before calling opal_memory_ptmalloc2_int_free(). Then, segfault occurs, the signal handler is called and we have opal_memory_ptmalloc2_free() called again inside the signal handler that will wait forever on the mutex locked by the first call.
The text was updated successfully, but these errors were encountered: