Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOOLS/PERF: Enable ROCM memory type support for UCP perf tests #4587

Merged
merged 2 commits into from
Dec 20, 2019

Conversation

souravzzz
Copy link
Member

What

Adds support for ROCm in UCP perf tests

Why ?

ROCm support was missing in perf tests
Required for manual testing and CI

$ ./bin/ucx_perftest localhost -f -c 2 -t tag_lat -m rocm
+--------------+-----------------------------+---------------------+-----------------------+
|              |       latency (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+---------+---------+---------+----------+----------+-----------+-----------+
| # iterations | typical | average | overall |  average |  overall |   average |   overall |
+--------------+---------+---------+---------+----------+----------+-----------+-----------+
       1000000     1.458     8.188     9.640       0.93       0.79      122133      103732

$ ./bin/ucx_perftest localhost -f -c 2 -t tag_bw -m rocm
+--------------+-----------------------------+---------------------+-----------------------+
|              |       latency (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+---------+---------+---------+----------+----------+-----------+-----------+
| # iterations | typical | average | overall |  average |  overall |   average |   overall |
+--------------+---------+---------+---------+----------+----------+-----------+-----------+
       1000000     1.224    11.347    10.781       0.67       0.71       88132       92760

@swx-jenkins3
Copy link
Collaborator

Can one of the admins verify this patch?

@dmitrygx
Copy link
Member

ok to test

@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 6 of 25 workers (click for details)

Note: the logs will be deleted after 23-Dec-2019

Agent/Stage Status
_main ❌ FAILURE
hpc-arm-cavium-jenkins_W2 ❌ FAILURE
hpc-arm-hwi-jenkins_W2 ❌ FAILURE
hpc-test-node-legacy_W2 ❌ FAILURE
hpc-test-node-new_W2 ❌ FAILURE
r-vmb-ppc-jenkins_W2 ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@mellanox-github
Copy link
Contributor

Mellanox CI: FAILED on 6 of 25 workers (click for details)

Note: the logs will be deleted after 23-Dec-2019

Agent/Stage Status
_main ❌ FAILURE
hpc-arm-cavium-jenkins_W2 ❌ FAILURE
hpc-arm-hwi-jenkins_W2 ❌ FAILURE
hpc-test-node-legacy_W2 ❌ FAILURE
hpc-test-node-new_W2 ❌ FAILURE
r-vmb-ppc-jenkins_W2 ❌ FAILURE
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@mellanox-github
Copy link
Contributor

Mellanox CI: PASSED on 25 workers (click for details)

Note: the logs will be deleted after 24-Dec-2019

Agent/Stage Status
_main ✔️ SUCCESS
hpc-arm-cavium-jenkins_W0 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W1 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W2 ✔️ SUCCESS
hpc-arm-cavium-jenkins_W3 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W0 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W1 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W2 ✔️ SUCCESS
hpc-arm-hwi-jenkins_W3 ✔️ SUCCESS
hpc-test-node-gpu_W0 ✔️ SUCCESS
hpc-test-node-gpu_W1 ✔️ SUCCESS
hpc-test-node-gpu_W2 ✔️ SUCCESS
hpc-test-node-gpu_W3 ✔️ SUCCESS
hpc-test-node-legacy_W0 ✔️ SUCCESS
hpc-test-node-legacy_W1 ✔️ SUCCESS
hpc-test-node-legacy_W2 ✔️ SUCCESS
hpc-test-node-legacy_W3 ✔️ SUCCESS
hpc-test-node-new_W0 ✔️ SUCCESS
hpc-test-node-new_W1 ✔️ SUCCESS
hpc-test-node-new_W2 ✔️ SUCCESS
hpc-test-node-new_W3 ✔️ SUCCESS
r-vmb-ppc-jenkins_W0 ✔️ SUCCESS
r-vmb-ppc-jenkins_W1 ✔️ SUCCESS
r-vmb-ppc-jenkins_W2 ✔️ SUCCESS
r-vmb-ppc-jenkins_W3 ✔️ SUCCESS

@souravzzz
Copy link
Member Author

@yosefe please review

@souravzzz
Copy link
Member Author

The azure pipeline seems to be stuck (https://dev.azure.com/ucfconsort/0b36e3f0-8ab9-4a48-b68b-4b2350e02c88/_build/results?buildId=3167)
Can someone please take a look?

@brminich
Copy link
Contributor

@amaslenn, is it the same azure issue we saw in #4589?

@amaslenn
Copy link
Contributor

amaslenn commented Dec 18, 2019

@amaslenn, is it the same azure issue we saw in #4589?

Yeah, restarting.

@amaslenn
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@brminich
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@souravzzz
Copy link
Member Author

@amaslenn @brminich still the same issue with Azure as #4589

@yosefe yosefe merged commit 79c4dde into openucx:master Dec 20, 2019
@souravzzz souravzzz deleted the topic/sourav/add-rocm-perftest branch December 23, 2019 04:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants