Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building with release tarball throws tuner_v1.h not found error #382

Closed
sirutBuasai opened this issue Apr 10, 2024 · 1 comment
Closed

Comments

@sirutBuasai
Copy link

I noticed that there's some discrepancy between the release tarball (https://github.com/aws/aws-ofi-nccl/releases/download/v1.9.0-aws/aws-ofi-nccl-1.9.0-aws.tar.gz) and the source code tarball (https://github.com/aws/aws-ofi-nccl/archive/refs/tags/v1.9.0-aws.tar.gz). The release tarball doesn't contain tuner_v*.h files while the source code do.

As such, building with release tarball throws the following error:

In file included from tuner/nccl_ofi_model.c:6:
../include/nccl-headers/nvidia/tuner.h:32:10: fatal error: tuner_v1.h: No such file or directory
   32 | #include "tuner_v1.h"
      |          ^~~~~~~~~~~~
compilation terminated.

However, building using the source code tarball is successful when running the following:

./autogen.sh
./configure \
  --prefix=$PREFIX \
  --with-mpi=/opt/amazon/openmpi \
  --with-libfabric=/opt/amazon/efa \
  --with-cuda=/usr/local/cuda-12.1 \
  --disable-tests
make
make install
rajachan added a commit to rajachan/aws-ofi-nccl that referenced this issue Apr 10, 2024
This was not caught in manual tests when a dist was built on a non-EC2
machine (or in Github Actions environment) since `WANT_PLATFORM_AWS`
paths would not be enabled in the Makefile then.

Fixes: aws#382

Signed-off-by: Raghu Raja <raghunch@amazon.com>
rajachan added a commit to rajachan/aws-ofi-nccl that referenced this issue Apr 11, 2024
Fixes aws#382

Signed-off-by: Raghu Raja <raghunch@amazon.com>
@rajachan
Copy link
Member

Thanks for reporting the issue, @sirutBuasai. The release artifact was indeed missing the headers. We are working on a bugfix release that should address this issue.

rajachan added a commit to rajachan/aws-ofi-nccl that referenced this issue Apr 11, 2024
Fixes aws#382

Signed-off-by: Raghu Raja <raghunch@amazon.com>
(cherry picked from commit 3df302e)
rajachan added a commit that referenced this issue Apr 12, 2024
Fixes #382

Signed-off-by: Raghu Raja <raghunch@amazon.com>
(cherry picked from commit 3df302e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants