Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't install pandas on EMR cluster #140

Open
czapol opened this issue Aug 13, 2021 · 1 comment
Open

Can't install pandas on EMR cluster #140

czapol opened this issue Aug 13, 2021 · 1 comment

Comments

@czapol
Copy link

czapol commented Aug 13, 2021

System Information

  • Spark 2.4.5
  • EMR cluster 5.30.1
  • Sagemaker notebook with sparkmagic kernel

I try to install some python additional libraries on EMR cluster using install_pypi_package API. A few months ago I had no problem to install pandas or sagemaker libraries but now I run into this long error. I am able to install some other libraries like boto3 without error.

Input:
sc.install_pypi_package("pandas")

Output:
FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…
Collecting cython
Using cached https://files.pythonhosted.org/packages/3d/48/bbca549da0b0f636c0f161e84d30172c40aafe99552680f297da7fedf102/Cython-0.29.24-cp37-cp37m-manylinux1_x86_64.whl
Installing collected packages: cython
Successfully installed cython-0.29.24

Collecting pandas
Using cached https://files.pythonhosted.org/packages/12/01/360d7f444f910ae16496c07e3f003cb8c641b4ca6c033408a4469a904df3/pandas-1.3.1.tar.gz
Building wheels for collected packages: unknown, unknown
Running setup.py bdist_wheel for unknown: started
Running setup.py bdist_wheel for unknown: finished with status 'error'
Complete output from command /tmp/1628881028302-0/bin/python -u -c "import setuptools, tokenize;file='/mnt/tmp/pip-build-p0xe97tr/pandas/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /tmp/tmprwhg6bvxpip-wheel- --python-tag cp37:
running bdist_wheel
running build
running build_ext
building 'pandas._libs.algos' extension
creating build
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/pandas
creating build/temp.linux-x86_64-3.7/pandas/_libs
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -DNPY_NO_DEPRECATED_API=0 -I./pandas/_libs -Ipandas/_libs/src/klib -I/usr/local/lib64/python3.7/site-packages/numpy/core/include -I/usr/include/python3.7m -c pandas/_libs/algos.c -o build/temp.linux-x86_64-3.7/pandas/_libs/algos.o
pandas/_libs/algos.c:41:10: fatal error: Python.h: No such file or directory
#include "Python.h"
^~~~~~~~~~
compilation terminated.
error: command 'gcc' failed with exit status 1


Running setup.py clean for unknown
Running setup.py bdist_wheel for unknown: started
Running setup.py bdist_wheel for unknown: still running...
Running setup.py bdist_wheel for unknown: finished with status 'error'
Complete output from command /tmp/1628881028302-0/bin/python -u -c "import setuptools, tokenize;file='/mnt/tmp/pip-build-p0xe97tr/pandas/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /tmp/tmp8zdwxxl1pip-wheel- --python-tag cp37:
Compiling pandas/_libs/algos.pyx because it changed.
Compiling pandas/_libs/arrays.pyx because it changed.
Compiling pandas/_libs/groupby.pyx because it changed.
Compiling pandas/_libs/hashing.pyx because it changed.
Compiling pandas/_libs/hashtable.pyx because it changed.
Compiling pandas/_libs/index.pyx because it changed.
Compiling pandas/_libs/indexing.pyx because it changed.
Compiling pandas/_libs/internals.pyx because it changed.
Compiling pandas/_libs/interval.pyx because it changed.
Compiling pandas/_libs/join.pyx because it changed.
Compiling pandas/_libs/lib.pyx because it changed.
Compiling pandas/_libs/missing.pyx because it changed.
Compiling pandas/_libs/parsers.pyx because it changed.
Compiling pandas/_libs/reduction.pyx because it changed.
Compiling pandas/_libs/ops.pyx because it changed.
Compiling pandas/_libs/ops_dispatch.pyx because it changed.
Compiling pandas/_libs/properties.pyx because it changed.
Compiling pandas/_libs/reshape.pyx because it changed.
Compiling pandas/_libs/sparse.pyx because it changed.
Compiling pandas/_libs/tslib.pyx because it changed.
Compiling pandas/_libs/tslibs/base.pyx because it changed.
Compiling pandas/_libs/tslibs/ccalendar.pyx because it changed.
Compiling pandas/_libs/tslibs/dtypes.pyx because it changed.
Compiling pandas/_libs/tslibs/conversion.pyx because it changed.
Compiling pandas/_libs/tslibs/fields.pyx because it changed.
Compiling pandas/_libs/tslibs/nattype.pyx because it changed.
Compiling pandas/_libs/tslibs/np_datetime.pyx because it changed.
Compiling pandas/_libs/tslibs/offsets.pyx because it changed.
Compiling pandas/_libs/tslibs/parsing.pyx because it changed.
Compiling pandas/_libs/tslibs/period.pyx because it changed.
Compiling pandas/_libs/tslibs/strptime.pyx because it changed.
Compiling pandas/_libs/tslibs/timedeltas.pyx because it changed.
Compiling pandas/_libs/tslibs/timestamps.pyx because it changed.
Compiling pandas/_libs/tslibs/timezones.pyx because it changed.
Compiling pandas/_libs/tslibs/tzconversion.pyx because it changed.
Compiling pandas/_libs/tslibs/vectorized.pyx because it changed.
Compiling pandas/_libs/testing.pyx because it changed.
Compiling pandas/_libs/window/aggregations.pyx because it changed.
Compiling pandas/_libs/window/indexers.pyx because it changed.
Compiling pandas/_libs/writers.pyx because it changed.
Compiling pandas/io/sas/sas.pyx because it changed.
[ 1/41] Cythonizing pandas/_libs/algos.pyx
[ 2/41] Cythonizing pandas/_libs/arrays.pyx
[ 3/41] Cythonizing pandas/_libs/groupby.pyx
[ 4/41] Cythonizing pandas/_libs/hashing.pyx
[ 5/41] Cythonizing pandas/_libs/hashtable.pyx
[ 6/41] Cythonizing pandas/_libs/index.pyx
[ 7/41] Cythonizing pandas/_libs/indexing.pyx
[ 8/41] Cythonizing pandas/_libs/internals.pyx
[ 9/41] Cythonizing pandas/_libs/interval.pyx
[10/41] Cythonizing pandas/_libs/join.pyx
[11/41] Cythonizing pandas/_libs/lib.pyx
[12/41] Cythonizing pandas/_libs/missing.pyx
[13/41] Cythonizing pandas/_libs/ops.pyx
[14/41] Cythonizing pandas/_libs/ops_dispatch.pyx
[15/41] Cythonizing pandas/_libs/parsers.pyx
[16/41] Cythonizing pandas/_libs/properties.pyx
[17/41] Cythonizing pandas/_libs/reduction.pyx
[18/41] Cythonizing pandas/_libs/reshape.pyx
[19/41] Cythonizing pandas/_libs/sparse.pyx
[20/41] Cythonizing pandas/_libs/testing.pyx
[21/41] Cythonizing pandas/_libs/tslib.pyx
[22/41] Cythonizing pandas/_libs/tslibs/base.pyx
[23/41] Cythonizing pandas/_libs/tslibs/ccalendar.pyx
[24/41] Cythonizing pandas/_libs/tslibs/conversion.pyx
[25/41] Cythonizing pandas/_libs/tslibs/dtypes.pyx
[26/41] Cythonizing pandas/_libs/tslibs/fields.pyx
[27/41] Cythonizing pandas/_libs/tslibs/nattype.pyx
[28/41] Cythonizing pandas/_libs/tslibs/np_datetime.pyx
[29/41] Cythonizing pandas/_libs/tslibs/offsets.pyx
[30/41] Cythonizing pandas/_libs/tslibs/parsing.pyx
[31/41] Cythonizing pandas/_libs/tslibs/period.pyx
[32/41] Cythonizing pandas/_libs/tslibs/strptime.pyx
[33/41] Cythonizing pandas/_libs/tslibs/timedeltas.pyx
[34/41] Cythonizing pandas/_libs/tslibs/timestamps.pyx
[35/41] Cythonizing pandas/_libs/tslibs/timezones.pyx
[36/41] Cythonizing pandas/_libs/tslibs/tzconversion.pyx
[37/41] Cythonizing pandas/_libs/tslibs/vectorized.pyx
[38/41] Cythonizing pandas/_libs/window/aggregations.pyx
[39/41] Cythonizing pandas/_libs/window/indexers.pyx
[40/41] Cythonizing pandas/_libs/writers.pyx
[41/41] Cythonizing pandas/io/sas/sas.pyx
running bdist_wheel
running build
running build_ext
building 'pandas._libs.algos' extension
creating build
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/pandas
creating build/temp.linux-x86_64-3.7/pandas/_libs
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -DNPY_NO_DEPRECATED_API=0 -I./pandas/_libs -Ipandas/_libs/src/klib -I/usr/local/lib64/python3.7/site-packages/numpy/core/include -I/usr/include/python3.7m -c pandas/_libs/algos.c -o build/temp.linux-x86_64-3.7/pandas/_libs/algos.o
pandas/_libs/algos.c:41:10: fatal error: Python.h: No such file or directory
#include "Python.h"
^~~~~~~~~~
compilation terminated.
error: command 'gcc' failed with exit status 1


Running setup.py clean for unknown
Failed to build unknown unknown
Installing collected packages: unknown
Running setup.py install for unknown: started
Running setup.py install for unknown: still running...
Running setup.py install for unknown: finished with status 'error'
Complete output from command /tmp/1628881028302-0/bin/python -u -c "import setuptools, tokenize;file='/mnt/tmp/pip-build-p0xe97tr/pandas/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-ie8cw1pj-record/install-record.txt --single-version-externally-managed --compile --install-headers /tmp/1628881028302-0/include/site/python3.7/unknown:
Compiling pandas/_libs/algos.pyx because it changed.
Compiling pandas/_libs/arrays.pyx because it changed.
Compiling pandas/_libs/groupby.pyx because it changed.
Compiling pandas/_libs/hashing.pyx because it changed.
Compiling pandas/_libs/hashtable.pyx because it changed.
Compiling pandas/_libs/index.pyx because it changed.
Compiling pandas/_libs/indexing.pyx because it changed.
Compiling pandas/_libs/internals.pyx because it changed.
Compiling pandas/_libs/interval.pyx because it changed.
Compiling pandas/_libs/join.pyx because it changed.
Compiling pandas/_libs/lib.pyx because it changed.
Compiling pandas/_libs/missing.pyx because it changed.
Compiling pandas/_libs/parsers.pyx because it changed.
Compiling pandas/_libs/reduction.pyx because it changed.
Compiling pandas/_libs/ops.pyx because it changed.
Compiling pandas/_libs/ops_dispatch.pyx because it changed.
Compiling pandas/_libs/properties.pyx because it changed.
Compiling pandas/_libs/reshape.pyx because it changed.
Compiling pandas/_libs/sparse.pyx because it changed.
Compiling pandas/_libs/tslib.pyx because it changed.
Compiling pandas/_libs/tslibs/base.pyx because it changed.
Compiling pandas/_libs/tslibs/ccalendar.pyx because it changed.
Compiling pandas/_libs/tslibs/dtypes.pyx because it changed.
Compiling pandas/_libs/tslibs/conversion.pyx because it changed.
Compiling pandas/_libs/tslibs/fields.pyx because it changed.
Compiling pandas/_libs/tslibs/nattype.pyx because it changed.
Compiling pandas/_libs/tslibs/np_datetime.pyx because it changed.
Compiling pandas/_libs/tslibs/offsets.pyx because it changed.
Compiling pandas/_libs/tslibs/parsing.pyx because it changed.
Compiling pandas/_libs/tslibs/period.pyx because it changed.
Compiling pandas/_libs/tslibs/strptime.pyx because it changed.
Compiling pandas/_libs/tslibs/timedeltas.pyx because it changed.
Compiling pandas/_libs/tslibs/timestamps.pyx because it changed.
Compiling pandas/_libs/tslibs/timezones.pyx because it changed.
Compiling pandas/_libs/tslibs/tzconversion.pyx because it changed.
Compiling pandas/_libs/tslibs/vectorized.pyx because it changed.
Compiling pandas/_libs/testing.pyx because it changed.
Compiling pandas/_libs/window/aggregations.pyx because it changed.
Compiling pandas/_libs/window/indexers.pyx because it changed.
Compiling pandas/_libs/writers.pyx because it changed.
Compiling pandas/io/sas/sas.pyx because it changed.
[ 1/41] Cythonizing pandas/_libs/algos.pyx
[ 2/41] Cythonizing pandas/_libs/arrays.pyx
[ 3/41] Cythonizing pandas/_libs/groupby.pyx
[ 4/41] Cythonizing pandas/_libs/hashing.pyx
[ 5/41] Cythonizing pandas/_libs/hashtable.pyx
[ 6/41] Cythonizing pandas/_libs/index.pyx
[ 7/41] Cythonizing pandas/_libs/indexing.pyx
[ 8/41] Cythonizing pandas/_libs/internals.pyx
[ 9/41] Cythonizing pandas/_libs/interval.pyx
[10/41] Cythonizing pandas/_libs/join.pyx
[11/41] Cythonizing pandas/_libs/lib.pyx
[12/41] Cythonizing pandas/_libs/missing.pyx
[13/41] Cythonizing pandas/_libs/ops.pyx
[14/41] Cythonizing pandas/_libs/ops_dispatch.pyx
[15/41] Cythonizing pandas/_libs/parsers.pyx
[16/41] Cythonizing pandas/_libs/properties.pyx
[17/41] Cythonizing pandas/_libs/reduction.pyx
[18/41] Cythonizing pandas/_libs/reshape.pyx
[19/41] Cythonizing pandas/_libs/sparse.pyx
[20/41] Cythonizing pandas/_libs/testing.pyx
[21/41] Cythonizing pandas/_libs/tslib.pyx
[22/41] Cythonizing pandas/_libs/tslibs/base.pyx
[23/41] Cythonizing pandas/_libs/tslibs/ccalendar.pyx
[24/41] Cythonizing pandas/_libs/tslibs/conversion.pyx
[25/41] Cythonizing pandas/_libs/tslibs/dtypes.pyx
[26/41] Cythonizing pandas/_libs/tslibs/fields.pyx
[27/41] Cythonizing pandas/_libs/tslibs/nattype.pyx
[28/41] Cythonizing pandas/_libs/tslibs/np_datetime.pyx
[29/41] Cythonizing pandas/_libs/tslibs/offsets.pyx
[30/41] Cythonizing pandas/_libs/tslibs/parsing.pyx
[31/41] Cythonizing pandas/_libs/tslibs/period.pyx
[32/41] Cythonizing pandas/_libs/tslibs/strptime.pyx
[33/41] Cythonizing pandas/_libs/tslibs/timedeltas.pyx
[34/41] Cythonizing pandas/_libs/tslibs/timestamps.pyx
[35/41] Cythonizing pandas/_libs/tslibs/timezones.pyx
[36/41] Cythonizing pandas/_libs/tslibs/tzconversion.pyx
[37/41] Cythonizing pandas/_libs/tslibs/vectorized.pyx
[38/41] Cythonizing pandas/_libs/window/aggregations.pyx
[39/41] Cythonizing pandas/_libs/window/indexers.pyx
[40/41] Cythonizing pandas/_libs/writers.pyx
[41/41] Cythonizing pandas/io/sas/sas.pyx
running install
running build
running build_ext
building 'pandas._libs.algos' extension
creating build
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/pandas
creating build/temp.linux-x86_64-3.7/pandas/_libs
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -DNPY_NO_DEPRECATED_API=0 -I./pandas/_libs -Ipandas/_libs/src/klib -I/usr/local/lib64/python3.7/site-packages/numpy/core/include -I/usr/include/python3.7m -c pandas/_libs/algos.c -o build/temp.linux-x86_64-3.7/pandas/_libs/algos.o
pandas/_libs/algos.c:41:10: fatal error: Python.h: No such file or directory
#include "Python.h"
^~~~~~~~~~
compilation terminated.
error: command 'gcc' failed with exit status 1

----------------------------------------

Running setup.py (path:/mnt/tmp/pip-build-p0xe97tr/pandas/setup.py) egg_info for package pandas produced metadata for project name unknown. Fix your #egg=pandas fragments.
Failed building wheel for unknown
Failed building wheel for unknown
Command "/tmp/1628881028302-0/bin/python -u -c "import setuptools, tokenize;file='/mnt/tmp/pip-build-p0xe97tr/pandas/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-ie8cw1pj-record/install-record.txt --single-version-externally-managed --compile --install-headers /tmp/1628881028302-0/include/site/python3.7/unknown" failed with error code 1 in /mnt/tmp/pip-build-p0xe97tr/pandas/

@czapol czapol changed the title Can't install sagemaker on EMR cluster Can't install pandas on EMR cluster Aug 13, 2021
@dacort
Copy link

dacort commented Oct 7, 2021

@czapol Came across this while working on something similar.

Maybe try installing an older version of pandas: sc.install_pypi_package("pandas==1.2.5") I think the 1.3.x series is incompatible with the version of pip that's on EMR.

You could also try sc.uninstall_package('pip') and sc.install_pypi_package("pip==21.2.4") before installing pandas, but I think the version of numpy that pandas requires is incompatible with the version that sagemaker-pyspark relies on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants