Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mujoco_py 2.1, always rebuilding in cluster #763

Closed
im-Kitsch opened this issue Feb 28, 2023 · 9 comments
Closed

Mujoco_py 2.1, always rebuilding in cluster #763

im-Kitsch opened this issue Feb 28, 2023 · 9 comments

Comments

@im-Kitsch
Copy link

Hi,

I try to import mujoco-py. But it always rebuild when I submit task via slurm. It always show "mujoco_py/cymj.pyx because it changed" and has error "cannot find -lGL: No such file or directory".

I installed mujoco_py using the method mentioned in Issue 627 #627. I can run it in login-node without any problem. But it doesn't anymore when I submitted job via slurm.

I installed dependecy by

conda install -c conda-forge glew
conda install -c conda-forge mesalib
conda install -c menpo glfw3

I can run mujoco-py 2.0* version in cluster without any problem. But mujoco_py 2.1* always rebuilds "cymj.pyx" when I import mujoco_py. I think mujoco-py wrongly checked the system environment. It may relates to conda or library link. But anyway, the rebuilding should be prevented. Is there any method so that I can prevent this behavior?

Thanks a lot in advance

To Reproduce

#!/bin/bash

#SBATCH -J test_job_16

#SBATCH -e /tmp/temp_test_mujoco/test_%x.%j.err
#SBATCH -o /tmp/temp_test_mujoco/test_%x.%j.out

conda activate tri7
python -c "import mujoco_py; print(mujoco_py.__version__)"

Then

sbatch job.sh

Error Messages

/home/user_id/miniconda3/envs/tri7/compiler_compat/ld: cannot find -lGL: No such file or directory
collect2: error: ld returned 1 exit status
Traceback (most recent call last):
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/unixccompiler.py", line 267, in link
    self.spawn(linker + ld_args)
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/ccompiler.py", line 1007, in spawn
    spawn(cmd, dry_run=self.dry_run, **kwargs)
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/spawn.py", line 70, in spawn
    raise DistutilsExecError(
distutils.errors.DistutilsExecError: command '/usr/bin/gcc' failed with exit code 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/mujoco_py-2.1.2.14-py3.9.egg/mujoco_py/__init__.py", line 2, in <module>
    from mujoco_py.builder import cymj, ignore_mujoco_warnings, functions, MujocoException
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/mujoco_py-2.1.2.14-py3.9.egg/mujoco_py/builder.py", line 504, in <module>
    cymj = load_cython_ext(mujoco_path)
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/mujoco_py-2.1.2.14-py3.9.egg/mujoco_py/builder.py", line 110, in load_cython_ext
    cext_so_path = builder.build()
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/mujoco_py-2.1.2.14-py3.9.egg/mujoco_py/builder.py", line 226, in build
    built_so_file_path = self._build_impl()
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/mujoco_py-2.1.2.14-py3.9.egg/mujoco_py/builder.py", line 278, in _build_impl
    so_file_path = super()._build_impl()
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/mujoco_py-2.1.2.14-py3.9.egg/mujoco_py/builder.py", line 249, in _build_impl
    dist.run_commands()
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
    super().run_command(command)
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
    _build_ext.build_ext.run(self)
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
    self.build_extensions()
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/mujoco_py-2.1.2.14-py3.9.egg/mujoco_py/builder.py", line 149, in build_extensions
    build_ext.build_extensions(self)
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
    _build_ext.build_ext.build_extensions(self)
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 468, in build_extensions
    self._build_extensions_serial()
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 494, in _build_extensions_serial
    self.build_extension(ext)
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 573, in build_extension
    self.compiler.link_shared_object(
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/ccompiler.py", line 751, in link_shared_object
    self.link(
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/unixccompiler.py", line 269, in link
    raise LinkError(msg)
distutils.errors.LinkError: command '/usr/bin/gcc' failed with exit code 1
srun: error: mpsc0176: task 0: Exited with exit code 1

Desktop (please complete the following information):

  • OS: Red Hat Enterprise Linux 8.7
  • Python Version 3.9
  • Mujoco Version 210
  • mujoco-py version 2.1.2.14
@interestingzhuo
Copy link

+1

1 similar comment
@gabrieletiboni
Copy link

+1

@gabrieletiboni
Copy link

@interestingzhuo @im-Kitsch have you guys figured something out? I'm stuck with the same problem

@im-Kitsch
Copy link
Author

@interestingzhuo @im-Kitsch have you guys figured something out? I'm stuck with the same problem

No, unfortunately, I think this isssue is out of my ability.

@saran-t
Copy link

saran-t commented Apr 23, 2023

Do you have any specific need for the old MuJoCo version?

@im-Kitsch
Copy link
Author

Do you have any specific need for the old MuJoCo version?

Yes, many classical implementations are still based on the old Mujoco and gym version. For example, stable baselines3, etc.

@gabrieletiboni
Copy link

gabrieletiboni commented Apr 23, 2023

@interestingzhuo @im-Kitsch I got it to work!

The mujoco_py README has a couple of lines regarding this cannot find -lGL error:
image

As I couldn't symlink without sudo rights, I:

  • located the libGL.so.1 file in my /usr lib dir (which in my cluster was in /usr/lib64)
  • copied it into my conda lib dir ($CONDA_PREFIX/lib in my case)
  • I created the symlink there: ln -s $CONDA_PREFIX/lib/libGL.so.1 $CONDA_PREFIX/lib/libGL.so

I was then able to build mujoco_py on the cluster nodes.

PS: I'm also stuck to this mujoco version as I'm using stable-baselines3.

CREDITS:
Originally posted by @luckeciano in #627 (comment)

@im-Kitsch
Copy link
Author

@interestingzhuo @im-Kitsch I got it to work!

The mujoco_py README has a couple of lines regarding this cannot find -lGL error: image

As I couldn't symlink without sudo rights, I:

  • located the libGL.so.1 file in my /usr lib dir (which in my cluster was in /usr/lib64)
  • copied it into my conda lib dir ($CONDA_PREFIX/lib in my case)
  • I created the symlink there: ln -s $CONDA_PREFIX/lib/libGL.so.1 $CONDA_PREFIX/lib/libGL.so

I was then able to build mujoco_py on the cluster nodes.

PS: I'm also stuck to this mujoco version as I'm using stable-baselines3.

CREDITS: Originally posted by @luckeciano in #627 (comment)

Hi, @gabrieletiboni , thanks for the hint. Unfortunately, I tried to copy libGL.so to $CONDA_PREFIX/lib, but it still doesn't work. I can build on cluster login-node, but submitting job doesn't work.

But congratualation to your node.

My output is as follows, if anyone has same issue:

gcc: fatal error: Killed signal terminated program cc1
compilation terminated.
Traceback (most recent call last):
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/unixccompiler.py", line 186, in _compile
    self.spawn(compiler_so + cc_args + [src, '-o', obj] + extra_postargs)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/ccompiler.py", line 1007, in spawn
    spawn(cmd, dry_run=self.dry_run, **kwargs)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/spawn.py", line 70, in spawn
    raise DistutilsExecError(
distutils.errors.DistutilsExecError: command '/usr/bin/gcc' failed with exit code 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zh50syxa/temp_test_mujoco/test_mujoco.py", line 1, in <module>
    import mujoco_py
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/mujoco_py/__init__.py", line 2, in <module>
    from mujoco_py.builder import cymj, ignore_mujoco_warnings, functions, MujocoException
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/mujoco_py/builder.py", line 504, in <module>
    cymj = load_cython_ext(mujoco_path)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/mujoco_py/builder.py", line 110, in load_cython_ext
    cext_so_path = builder.build()
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/mujoco_py/builder.py", line 226, in build
    built_so_file_path = self._build_impl()
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/mujoco_py/builder.py", line 278, in _build_impl
    so_file_path = super()._build_impl()
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/mujoco_py/builder.py", line 249, in _build_impl
    dist.run_commands()
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/dist.py", line 1208, in run_command
    super().run_command(command)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
    _build_ext.build_ext.run(self)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
    self.build_extensions()
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/mujoco_py/builder.py", line 149, in build_extensions
    build_ext.build_extensions(self)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
    _build_ext.build_ext.build_extensions(self)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 468, in build_extensions
    self._build_extensions_serial()
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 494, in _build_extensions_serial
    self.build_extension(ext)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 549, in build_extension
    objects = self.compiler.compile(
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/ccompiler.py", line 599, in compile
    self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/unixccompiler.py", line 188, in _compile
    raise CompileError(msg)
distutils.errors.CompileError: command '/usr/bin/gcc' failed with exit code 1
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=41479462.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.

@im-Kitsch
Copy link
Author

finally, I think I solved this by using @gabrieletiboni 's method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants