This repository has been archived by the owner on Dec 9, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 22
Avoid shared memory bug in openmpi>=3.1.2 #523
Comments
@leofang can you sync our recipe with the conda-forge one (which sadly for now is a copy-paste job 😞 ) |
Sure, happy to do. But let me first check with Open MPI people and see if this is a known bug or caused by other problems. |
@tacaswell since you brought up, I'd also like to copy and paste the conda-forge recipes for |
@leofang Definately - go ahead |
leofang
changed the title
Avoid shared memory bug in openmpi v3.1.2
Avoid shared memory bug in openmpi>=3.1.2
Feb 15, 2019
leofang
added a commit
to leofang/lightsource2-recipes
that referenced
this issue
Feb 15, 2019
leofang
added a commit
to leofang/lightsource2-recipes
that referenced
this issue
Feb 16, 2019
mrakitin
pushed a commit
that referenced
this issue
Mar 1, 2019
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Currently we build v3.1.2 in
nsls2-tag
:lightsource2-recipes/recipes-tag/openmpi/meta.yaml
Line 2 in 87723db
However, this version seems to be buggy. If one spawns a few MPI processes, let them do some work, but terminate them abnormally (
Ctrl-C
and whatnot), it can be seen that in/dev/shm/
there will be shared memory segments related to openmpi'svader
component (don't ask me what this is...) that are not unlinked by openmpi during the cleanup phase:and they will remain there until the system is reboot, eating up slowly the system's memory!
Based on openmpi's changelog (see here), it seems that
vader
was reworked in v3.1.2, presumably this bug sneaked in by then.There's a bug fix in v3.1.3 hopefully would address this; if not, we can downgrade to v3.1.1, which I tested and worked without this issue.(UPDATE: v3.1.3 also has this problem, has to use 3.1.1...)
I have other questions related to building conda packages for
mpi4py
,openmpi
, andmpich
, but perhaps I should ask them somewhere else...The text was updated successfully, but these errors were encountered: