Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLD: compilation failure at comm_nccl.cu #959

Open
tylerjereddy opened this issue Oct 11, 2024 · 1 comment
Open

BLD: compilation failure at comm_nccl.cu #959

tylerjereddy opened this issue Oct 11, 2024 · 1 comment

Comments

@tylerjereddy
Copy link

On the LANL Venado machine, Linux ARM/Grace-Hopper architecture, whether using clang 18 (Cray clang version 18.0.0) or gcc-13 (13.2.1) compiler toolchain (both with nvcc from CUDA 12.5), the same compilation error arises for a recently-provided legate release (we only received a tarball--and the only version info I can find is CMakeLists.txt:set(legate_version 24.09.00), but this may be a dev version of that and not a tagged release yet). If you direct me to the appropriate location to grep out an embedded git hash I'll go ahead and do that for you, but I don't have a git bundle, just a preview release tarball as far as I can tell.

Here are the steps I follow on Venado:

Set up of environment and compilation commands
cd /lustre/vescratch1/treddy/custom_nvidia/legate
rm -rf arch-linux-cuda-release
eval "$(/lustre/vescratch1/treddy/tyler_conda/conda_scratch/bin/conda shell.bash hook)"
conda activate legate_custom
set +o errexit
set +e 
module load PrgEnv-gnu/8.5.0
export CC=gcc-13
export CXX=g++-13
export CPATH=/opt/cray/libfabric/1.20.1/include:$CPATH
export LIBRARY_PATH=/opt/cray/libfabric/1.20.1/lib64:$LIBRARY_PATH
export LD_LIBRARY_PATH=/opt/cray/libfabric/1.20.1/lib64:$LD_LIBRARY_PATH
module load cudatoolkit/24.7_12.5 
module load cray-hdf5-parallel/1.14.3.1
export LD_LIBRARY_PATH=/opt/cray/pe/mpich/8.1.30/ofi/crayclang/17.0/lib:$LD_LIBRARY_PATH
export LIBRARY_PATH=/opt/cray/pe/mpich/8.1.30/ofi/crayclang/17.0/lib:$LIBRARY_PATH
export PATH=$PATH:/opt/cray/pe/cce/18.0.0/bin
export PATH=/opt/cray/libfabric/1.20.1/bin:$PATH
./configure --with-cuda --with-hdf5 --with-gasnet
export LEGATE_ARCH='arch-linux-cuda-release'
export LEGATE_DIR='/lustre/vescratch1/treddy/custom_nvidia/legate'
make -j 64

And here is the compilation failure (snipped at the end because the C++ compilation spam is after the error is a bit much):

[212/308] Building CXX object _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion_analysis.cc.o
In file included from /usr/include/c++/13/bits/specfun.h:43,
                 from /usr/include/c++/13/cmath:3699,
                 from /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/legion-src/runtime/legion/legion_analysis.cc:16:
In static member function ‘static _Up* std::__copy_move<_IsMove, true, std::random_access_iterator_tag>::__copy_m(_Tp*, _Tp*, _Up*) [with _Tp = Legion::Internal::CopyFillAggregator::CopyUpdate*; _Up = Legion::Internal::CopyFillAggregator::CopyUpdate*; bool _IsMove = false]’,
    inlined from ‘_OI std::__copy_move_a2(_II, _II, _OI) [with bool _IsMove = false; _II = Legion::Internal::CopyFillAggregator::CopyUpdate**; _OI = Legion::Internal::CopyFillAggregator::CopyUpdate**]’ at /usr/include/c++/13/bits/stl_algobase.h:506:30,
    inlined from ‘_OI std::__copy_move_a1(_II, _II, _OI) [with bool _IsMove = false; _II = Legion::Internal::CopyFillAggregator::CopyUpdate**; _OI = Legion::Internal::CopyFillAggregator::CopyUpdate**]’ at /usr/include/c++/13/bits/stl_algobase.h:533:42,
    inlined from ‘_OI std::__copy_move_a(_II, _II, _OI) [with bool _IsMove = false; _II = __gnu_cxx::__normal_iterator<Legion::Internal::CopyFillAggregator::CopyUpdate**, vector<Legion::Internal::CopyFillAggregator::CopyUpdate*> >; _OI = Legion::Internal::CopyFillAggregator::CopyUpdate**]’ at /usr/include/c++/13/bits/stl_algobase.h:540:31,
    inlined from ‘_OI std::copy(_II, _II, _OI) [with _II = __gnu_cxx::__normal_iterator<Legion::Internal::CopyFillAggregator::CopyUpdate**, vector<Legion::Internal::CopyFillAggregator::CopyUpdate*> >; _OI = Legion::Internal::CopyFillAggregator::CopyUpdate**]’ at /usr/include/c++/13/bits/stl_algobase.h:633:7,
    inlined from ‘static _ForwardIterator std::__uninitialized_copy<true>::__uninit_copy(_InputIterator, _InputIterator, _ForwardIterator) [with _InputIterator = __gnu_cxx::__normal_iterator<Legion::Internal::CopyFillAggregator::CopyUpdate**, std::vector<Legion::Internal::CopyFillAggregator::CopyUpdate*> >; _ForwardIterator = Legion::Internal::CopyFillAggregator::CopyUpdate**]’ at /usr/include/c++/13/bits/stl_uninitialized.h:147:27,
    inlined from ‘_ForwardIterator std::uninitialized_copy(_InputIterator, _InputIterator, _ForwardIterator) [with _InputIterator = __gnu_cxx::__normal_iterator<Legion::Internal::CopyFillAggregator::CopyUpdate**, vector<Legion::Internal::CopyFillAggregator::CopyUpdate*> >; _ForwardIterator = Legion::Internal::CopyFillAggregator::CopyUpdate**]’ at /usr/include/c++/13/bits/stl_uninitialized.h:185:15,
    inlined from ‘_ForwardIterator std::__uninitialized_copy_a(_InputIterator, _InputIterator, _ForwardIterator, allocator<_Tp>&) [with _InputIterator = __gnu_cxx::__normal_iterator<Legion::Internal::CopyFillAggregator::CopyUpdate**, vector<Legion::Internal::CopyFillAggregator::CopyUpdate*> >; _ForwardIterator = Legion::Internal::CopyFillAggregator::CopyUpdate**; _Tp = Legion::Internal::CopyFillAggregator::CopyUpdate*]’ at /usr/include/c++/13/bits/stl_uninitialized.h:373:37,
    inlined from ‘void std::vector<_Tp, _Alloc>::_M_range_insert(iterator, _ForwardIterator, _ForwardIterator, std::forward_iterator_tag) [with _ForwardIterator = __gnu_cxx::__normal_iterator<Legion::Internal::CopyFillAggregator::CopyUpdate**, std::vector<Legion::Internal::CopyFillAggregator::CopyUpdate*> >; _Tp = Legion::Internal::CopyFillAggregator::CopyUpdate*; _Alloc = std::allocator<Legion::Internal::CopyFillAggregator::CopyUpdate*>]’ at /usr/include/c++/13/bits/vector.tcc:814:38,
    inlined from ‘std::vector<_Tp, _Alloc>::iterator std::vector<_Tp, _Alloc>::insert(const_iterator, _InputIterator, _InputIterator) [with _InputIterator = __gnu_cxx::__normal_iterator<Legion::Internal::CopyFillAggregator::CopyUpdate**, std::vector<Legion::Internal::CopyFillAggregator::CopyUpdate*> >; <template-parameter-2-2> = void; _Tp = Legion::Internal::CopyFillAggregator::CopyUpdate*; _Alloc = std::allocator<Legion::Internal::CopyFillAggregator::CopyUpdate*>]’ at /usr/include/c++/13/bits/stl_vector.h:1483:19,
    inlined from ‘void Legion::Internal::CopyFillAggregator::issue_copies(Legion::Internal::InstanceView*, std::map<Legion::Internal::InstanceView*, std::vector<CopyUpdate*> >&, std::set<Legion::Internal::RtEvent>&, Legion::Internal::ApEvent, const Legion::Internal::FieldMask&, const Legion::Internal::PhysicalTraceInfo&, bool, bool, std::vector<Legion::Internal::ApEvent>*)’ at /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/legion-src/runtime/legion/legion_analysis.cc:7339:28:
/usr/include/c++/13/bits/stl_algobase.h:437:30: warning: ‘void* __builtin_memmove(void*, const void*, long unsigned int)’ writing between 9 and 9223372036854775800 bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
  437 |             __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
      |             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/c++/13/aarch64-suse-linux/bits/c++allocator.h:33,
                 from /usr/include/c++/13/bits/allocator.h:46,
                 from /usr/include/c++/13/bits/stl_tree.h:64,
                 from /usr/include/c++/13/map:62,
                 from /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/legion-src/runtime/legion/legion_types.h:30,
                 from /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/legion-src/runtime/legion.h:56,
                 from /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/legion-src/runtime/legion/legion_analysis.cc:17:
In member function ‘_Tp* std::__new_allocator<_Tp>::allocate(size_type, const void*) [with _Tp = Legion::Internal::InstanceView*]’,
    inlined from ‘static _Tp* std::allocator_traits<std::allocator<_Tp1> >::allocate(allocator_type&, size_type) [with _Tp = Legion::Internal::InstanceView*]’ at /usr/include/c++/13/bits/alloc_traits.h:482:28,
    inlined from ‘std::_Vector_base<_Tp, _Alloc>::pointer std::_Vector_base<_Tp, _Alloc>::_M_allocate(std::size_t) [with _Tp = Legion::Internal::InstanceView*; _Alloc = std::allocator<Legion::Internal::InstanceView*>]’ at /usr/include/c++/13/bits/stl_vector.h:378:33,
    inlined from ‘std::_Vector_base<_Tp, _Alloc>::pointer std::_Vector_base<_Tp, _Alloc>::_M_allocate(std::size_t) [with _Tp = Legion::Internal::CopyFillAggregator::CopyUpdate*; _Alloc = std::allocator<Legion::Internal::CopyFillAggregator::CopyUpdate*>]’ at /usr/include/c++/13/bits/stl_vector.h:375:7,
    inlined from ‘void std::vector<_Tp, _Alloc>::_M_range_insert(iterator, _ForwardIterator, _ForwardIterator, std::forward_iterator_tag) [with _ForwardIterator = __gnu_cxx::__normal_iterator<Legion::Internal::CopyFillAggregator::CopyUpdate**, std::vector<Legion::Internal::CopyFillAggregator::CopyUpdate*> >; _Tp = Legion::Internal::CopyFillAggregator::CopyUpdate*; _Alloc = std::allocator<Legion::Internal::CopyFillAggregator::CopyUpdate*>]’ at /usr/include/c++/13/bits/vector.tcc:805:40,
    inlined from ‘std::vector<_Tp, _Alloc>::iterator std::vector<_Tp, _Alloc>::insert(const_iterator, _InputIterator, _InputIterator) [with _InputIterator = __gnu_cxx::__normal_iterator<Legion::Internal::CopyFillAggregator::CopyUpdate**, std::vector<Legion::Internal::CopyFillAggregator::CopyUpdate*> >; <template-parameter-2-2> = void; _Tp = Legion::Internal::CopyFillAggregator::CopyUpdate*; _Alloc = std::allocator<Legion::Internal::CopyFillAggregator::CopyUpdate*>]’ at /usr/include/c++/13/bits/stl_vector.h:1483:19,
    inlined from ‘void Legion::Internal::CopyFillAggregator::issue_copies(Legion::Internal::InstanceView*, std::map<Legion::Internal::InstanceView*, std::vector<CopyUpdate*> >&, std::set<Legion::Internal::RtEvent>&, Legion::Internal::ApEvent, const Legion::Internal::FieldMask&, const Legion::Internal::PhysicalTraceInfo&, bool, bool, std::vector<Legion::Internal::ApEvent>*)’ at /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/legion-src/runtime/legion/legion_analysis.cc:7339:28:
/usr/include/c++/13/bits/new_allocator.h:151:55: note: at offset [-9223372036854775808, -1] into destination object of size [8, 9223372036854775800] allocated by ‘operator new’
  151 |         return static_cast<_Tp*>(_GLIBCXX_OPERATOR_NEW(__n * sizeof(_Tp)));
      |                                                       ^
[296/308] Building CUDA object src/cpp/CMakeFiles/legate.dir/legate/comm/detail/comm_nccl.cu.o
FAILED: src/cpp/CMakeFiles/legate.dir/legate/comm/detail/comm_nccl.cu.o 
/opt/nvidia/hpc_sdk/Linux_aarch64/24.7/cuda/12.5/bin/nvcc -forward-unknown-to-host-compiler -DFMT_SHARED -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DUSE_CUDA -DUSE_HDF -Dlegate_EXPORTS -I/lustre/vescratch1/treddy/custom_nvidia/legate/src/cpp -I/lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/src/cpp/include/legate -I/lustre/vescratch1/treddy/custom_nvidia/legate/share/legate/mpi_wrapper/src -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/cccl-src/thrust/thrust/cmake/../.. -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/cccl-src/libcudacxx/lib/cmake/libcudacxx/../../../include -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/cccl-src/cub/cub/cmake/../.. -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/legion-src/runtime -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/legion-src/runtime/mappers -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/legion-build/runtime -isystem /opt/nvidia/hpc_sdk/Linux_aarch64/24.7/cuda/12.5/targets/sbsa-linux/include -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/mdspan-src/include -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/span-src/include -isystem /lustre/vescratch1/treddy/tyler_conda/conda_scratch/envs/legate_custom/include -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/fmt-src/include -isystem /lustre/vescratch1/treddy/custom_nvidia/legate/arch-linux-cuda-release/cmake_build/_deps/argparse-src/include --compiler-options=-O3 -O2 -std=c++17 -arch=all-major -Xcompiler=-fPIC -Xfatbin=-compress-all --expt-extended-lambda --expt-relaxed-constexpr -Wno-deprecated-gpu-targets -MD -MT src/cpp/CMakeFiles/legate.dir/legate/comm/detail/comm_nccl.cu.o -MF src/cpp/CMakeFiles/legate.dir/legate/comm/detail/comm_nccl.cu.o.d -x cu -c /lustre/vescratch1/treddy/custom_nvidia/legate/src/cpp/legate/comm/detail/comm_nccl.cu -o src/cpp/CMakeFiles/legate.dir/legate/comm/detail/comm_nccl.cu.o
/lustre/vescratch1/treddy/custom_nvidia/legate/src/cpp/legate/task/variant_helper.h: In instantiation of ‘static void legate::detail::VariantHelper<T, SELECTOR, true>::record(const legate::Library&, legate::TaskInfo*, const std::map<legate::VariantCode, legate::VariantOptions>&) [with T = legate::detail::comm::nccl::InitId; SELECTOR = legate::detail::GPUVariant]’:
/lustre/vescratch1/treddy/custom_nvidia/legate/src/cpp/legate/task/task.inl:55:64:   required from ‘static std::unique_ptr<legate::TaskInfo> legate::LegateTask<T>::create_task_info_(const legate::Library&, const std::map<legate::VariantCode, legate::VariantOptions>&) [with T = legate::detail::comm::nccl::InitId]’
/lustre/vescratch1/treddy/custom_nvidia/legate/src/cpp/legate/task/task.inl:44:37:   required from ‘static void legate::LegateTask<T>::register_variants(legate::Library, legate::LocalTaskID, const std::map<legate::VariantCode, legate::VariantOptions>&) [with T = legate::detail::comm::nccl::InitId]’
/lustre/vescratch1/treddy/custom_nvidia/legate/src/cpp/legate/task/task.inl:37:18:   required from ‘static void legate::LegateTask<T>::register_variants(legate::Library, const std::map<legate::VariantCode, legate::VariantOptions>&) [with T = legate::detail::comm::nccl::InitId]’
/lustre/vescratch1/treddy/custom_nvidia/legate/src/cpp/legate/comm/detail/comm_nccl.cu:277:56:   required from here
/lustre/vescratch1/treddy/custom_nvidia/legate/src/cpp/legate/task/variant_helper.h:133:16: error: unable to deduce ‘const auto’ from ‘task_wrapper_<std::invoke_result_t<ncclUniqueId (* const)(const Legion::Task*, const std::vector<Legion::PhysicalRegion, std::allocator<Legion::PhysicalRegion> >&, Legion::Internal::TaskContext*, Legion::Runtime*), const Legion::Task*, const std::vector<Legion::PhysicalRegion, std::allocator<Legion::PhysicalRegion> >&, Legion::Internal::TaskContext*, Legion::Runtime*>, variant_impl, variant_kind>’
       constexpr auto entry = T::BASE::template task_wrapper_<RET, variant_impl, variant_kind>;
@lightsighter
Copy link
Contributor

You can ignore the warning for the legion_analysis.cc translation unit. It is a bug with the -Wstringop-overflow static analysis which is present in many compilers. You can read more about it here.

The real problem is this:

/lustre/vescratch1/treddy/custom_nvidia/legate/src/cpp/legate/task/variant_helper.h:133:16: error: unable to deduce ‘const auto’ from ‘task_wrapper_<std::invoke_result_t<ncclUniqueId (* const)(const Legion::Task*, const std::vector<Legion::PhysicalRegion, std::allocator<Legion::PhysicalRegion> >&, Legion::Internal::TaskContext*, Legion::Runtime*), const Legion::Task*, const std::vector<Legion::PhysicalRegion, std::allocator<Legion::PhysicalRegion> >&, Legion::Internal::TaskContext*, Legion::Runtime*>, variant_impl, variant_kind>’
       constexpr auto entry = T::BASE::template task_wrapper_<RET, variant_impl, variant_kind>;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants