-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI_Finalize slow with CUDA + OpenMPI 2.x (known OpenMPI issue; fixed in 3.1) #2698
Comments
I ran this test with verbose mode enabled. It looks like a known OpenMPI issue open-mpi/ompi#3244 :
|
I am using OpenMPI 2.0.1, which according to the above link, manifests the issue. The patch that fixes it was reportedly merged in OpenMPI 3.1. @bartlettroscoe @nmhamster @micahahoward might want to know about this. |
@mhoemmen said:
We are not seeing runtimes like that in any of the current Trilinos builds (including all of the ATDM Trilinos builds) as shown today, for example, at: The max runtime for that test shown there is 31 seconds. |
Today: @trilinos/tpetra suggests adding a configure-time test to Tpetra's CMake logic, to detect the OpenMPI version and report a warning (not an error) if it's one of the versions known to have this issue. |
@trilinos/framework The test be at the Trilinos CMake level, rather than at the Tpetra level, right? Codes could call MPI_Finalize() without enabling Tpetra. |
@kddevin wrote:
I agree. The test applies to any package that uses both MPI and CUDA. It can't be in Kokkos, because Kokkos (at least Core) does not depend on MPI. STK depends on Kokkos(Core) and MPI, but not Tpetra. Thus, it makes practical sense for the test to live at the Trilinos CMake level. |
@jwillenbring and I chatted about this on the phone today. I think it's a higher priority to fix the Dashboard CUDA builds so they use the right version of OpenMPI. Jim asked whether it would make more sense for the configure process to stop with an error instead of just printing a warning; I thought that would be good but there would always be that one user: https://xkcd.com/1172/ |
Makes it easier to load modules and run utilities out of tribits/python_utils. Since tribits/ci_support depends on tibits/python_utils, this is not making things any less general. This will make it easier to write unit tests for cdash_build_testing_date.py.
…2698) This should complete the major features for a TriBITS-based install for Trilinos that is robust to package build failures and correctly sets the installed directory permissions. Build/Test Cases Summary Enabled Packages: Enabled all Packages 0) MPI_DEBUG => passed: passed=353,notpassed=0 (1.14 min) 1) SERIAL_RELEASE => passed: passed=353,notpassed=0 (1.20 min) Other local commits for this build/test group: 747ad3d, be2522b, 4903a53
…ault (trilinos/Trilinos#2698) Turns out the default for <Project>_ENABLE_INSTALL_CMAKE_CONFIG_FILES is OFF, not ON. That was very confusing. It is important that we test installs of TribitsExampleProject where there are install failures and we ensure that the file <Project>Config.cmake gets installed correctly and is usable when an installation fails.
This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. |
This issue was closed due to inactivity for 395 days. |
Tpetra::CrsMatrix UnitTests2 takes > 560s in a CUDA 8 release build on K80. Seriously, what's going on? Do I need different
KOKKOS_ARCH
settings? It sure would have been nice to have had some performance tracking so we could have caught this. I don't think this is anything we did; we've only been fixing CUDA issues over time.@trilinos/tpetra
The text was updated successfully, but these errors were encountered: