Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ldmsd hanging on delete_thread fix #1449

Merged
merged 1 commit into from
Sep 26, 2024

Conversation

jennfshr
Copy link
Collaborator

The ldmsd aggregators on one of our production clusters would enter into a deadlock state waiting on delete_thread indefinitely, where the aggregator daemon continued to run, would use 100% cpu, and in gdb session with a debug-symbol supported aggregator daemon instance, we discovered the issue manifests in ldms/src/core/ldms.c where an if statement lacked a test to confirm success of red/black node relationship, (?).

I'm proposing adding in this code-fix for this issue we continue to encounter on Sandia systems after debug session on tag release v4.4.3 with @tom95858, on aggregators without the patch.

@bschwal

@tom95858 tom95858 merged commit e8635ea into ovis-hpc:b4.4 Sep 26, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants