-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TOOLS/PERF: changes to enable the rocm perf modules #4434
Conversation
Can one of the admins verify this patch? |
Mellanox CI: FAILED on 5 of 25 workers (click for details)Note: the logs will be deleted after 19-Nov-2019
|
Mellanox CI: FAILED on 5 of 25 workers (click for details)Note: the logs will be deleted after 19-Nov-2019
|
bot:retest |
Mellanox CI: FAILED on 9 of 25 workers (click for details)Note: the logs will be deleted after 20-Nov-2019
|
@paklui the failures seem to be related to your PR:
|
Yes, noticed the Mellanox CI was giving out errors due to it having a much older version (1.5) of HIP and ROCm stack. I asked @yosefe earlier to have the sysadmin to update the ROCm stack to the latest version 2.9. I think Mellanox CI runs on CentOS, the instructions for the latest ROCm setup: https://rocm.github.io/ROCmInstall.html#centosrhel-7-76-support I checked my log files on a couple of systems they seem to be compiling file without any issues. |
Included additional checks to prevent old HIP version from building ROCm perftest compiles when hipconfig is in recent version:
skip building ROCm perftest when hipconfig is old:
skip building ROCm perftest when hipconfig is not available:
|
Mellanox CI: FAILED on 7 of 25 workers (click for details)Note: the logs will be deleted after 05-Dec-2019
|
Mellanox CI: FAILED on 8 of 25 workers (click for details)Note: the logs will be deleted after 05-Dec-2019
|
bot:mlx:retest |
Mellanox CI: FAILED on 8 of 25 workers (click for details)Note: the logs will be deleted after 07-Dec-2019
|
bot:mlx:retest |
Mellanox CI: FAILED on 7 of 25 workers (click for details)Note: the logs will be deleted after 07-Dec-2019
|
bot:mlx:retest |
Mellanox CI: FAILED on 5 of 25 workers (click for details)Note: the logs will be deleted after 08-Dec-2019
|
bot:mlx:retest |
@paklui is there any missing lib? |
@yosefe should be there if the hip_hcc rpm is installed on the test node. I don't have access to see the logs in MellanoxLab, is it where it's failing? |
See #4434 (comment) |
What
I am including the changes to enable the perf modules for rocm to get built, so we can run ucx_perftest on rocm device. This PR is a do-over from the previous PR #4349 due to issue with commit title
Why ?
The rocm support on ucx_perftest is not enabled, so just to enable the ucx_perftest. This change does not change the existing functionality of rocm in ucx.