Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCS/QUEUE: Fix ucs_queue_for_each() with armclang #2968

Merged
merged 4 commits into from
Oct 23, 2018

Conversation

yosefe
Copy link
Contributor

@yosefe yosefe commented Oct 19, 2018

Fixes #2928

armclang assumes an address of struct field cannot be NULL, so the exit
condition in ucs_queue_for_each() does not work. Fix by checking the
pointer to the struct instead of its member.
@yosefe yosefe added the Bugfix label Oct 19, 2018
@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5444/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8007/ for details (Mellanox internal link).

@shamisp
Copy link
Contributor

shamisp commented Oct 19, 2018

@yosefe looks like clang thing, not arm specific.
@nSircombe please take a look.

@shamisp
Copy link
Contributor

shamisp commented Oct 19, 2018

Also, since this is compilation issues, IMHO it should be back ported to v1.4.x tree.

@nSircombe
Copy link

nSircombe commented Oct 19, 2018

Yes, the issue also effects clang.

Had a quick go with clang (LLVM5) and Armclang (a version built off of LLVM7), both now fail on test_callbackq_noflags.remove_if...

[----------] 2 tests from test_callbackq_noflags, where TypeParam = 
[ RUN      ] test_callbackq_noflags.oneshot
[       OK ] test_callbackq_noflags.oneshot (0 ms)
[ RUN      ] test_callbackq_noflags.remove_if
[man-08:32748:0:32748]   callbackq.c:180  Assertion `idx <= last_idx' failed

@yosefe
Copy link
Contributor Author

yosefe commented Oct 22, 2018

@nSircombe @shamisp i've tried with 'Arm C/C++/Fortran Compiler version 18.4.2 (build number 4) (based on LLVM 5.0.2)' and did not see a problem with 'test_callbackq_noflags.remove_if'
Do we have any system with remote access which shows the issue?

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5452/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8028/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8018/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8022/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8026/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8016/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8021/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8025/ for details (Mellanox internal link).

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5453/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8029/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8033/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8032/ for details (Mellanox internal link).

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5456/ for details.

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5457/ for details.

@nSircombe
Copy link

nSircombe commented Oct 22, 2018 via email

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5460/ for details.

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5461/ for details.

@shamisp
Copy link
Contributor

shamisp commented Oct 22, 2018

@yosefe Do we need backport this ?

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8035/ for details (Mellanox internal link).

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5465/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8045/ for details (Mellanox internal link).

@swx-jenkins1
Copy link

Test FAILed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5472/ for details.

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8047/ for details (Mellanox internal link).

@yosefe
Copy link
Contributor Author

yosefe commented Oct 23, 2018

@hoopoepg can you take a look again?

ucs_queue_push(&head, &elem3->queue);
ucs_queue_push(&head, &elem4->queue);

for (int i = 0; i < 4; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor:

for (int i = 0; i < elems.size(); ++i) {

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5483/ for details.

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8055/ for details (Mellanox internal link).

@yosefe yosefe merged commit 589f5c0 into openucx:master Oct 23, 2018
@yosefe yosefe deleted the topic/ucs-queue-iter-fix-segv branch October 23, 2018 12:58
@shamisp
Copy link
Contributor

shamisp commented Oct 23, 2018

@nSircombe did it resolve the problem ?
@yosefe backport ?

@shamisp shamisp mentioned this pull request Oct 23, 2018
@nSircombe
Copy link

@shamisp - yep, it builds and runs with the latest release (Armclang 18.4.2) - thanks @yosefe!

shamisp added a commit that referenced this pull request Oct 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants