-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/WIREUP: Consider local distance during slow lanes dropping - v1.16.x - revert #9781
UCP/WIREUP: Consider local distance during slow lanes dropping - v1.16.x - revert #9781
Conversation
Do we need to revert it on master branch? |
No, we didn't, the fix works OK on master.
It was just a logical conclusion. The original fixes that works for master checked that UCX version is >= 17 and then applied new logic, with older UCX version it leaves the logic as is to avoid different lanes performance on old UCX. If we somehow change that for UCX 1.16, it will break the logic for master branch.
I think this error can require some specific setup to be reproduced. |
Maybe because CI doesn't test current v1.16.x vs master branch? |
Yes, that should be the case, but also it should be the system with several NICs and these NICs should have different distance to GPU. The original issue can be reproduced on DGX1 |
ok |
What's wrong with the commit title? |
we do test it https://github.com/openucx/ucx/blob/master/buildlib/pr/wire_compat.yml#L11 |
I think in the current CI config - future master PRs would have failed after we merged something bad to v1.16.x |
This reverts commit c8fb76b.
efe33b8
to
0287705
Compare
@yosefe pls review |
What
Reverting the backport of #9512 since it breaks wire compatibility.