sambacc: add a retry loop to ctdb.monitor_cluster_meta_changes #130

phlogistonjohn · 2024-08-16T12:44:11Z

Add a loop that tries the ctdb reloadnodes command after an increasing delay. This is an attempt to fix a condition where ctdbd is apparently not ready to handle the ctdb reloadnodes command. In this case the command would be run, but fail and an exception would be raised in the monitor_cluster_meta_changes function would raise an exception. This would be caught by the command-level retry loop. However, this command-level retry loop will simply re-run monitor_cluster_meta_changes and this function now no longer has the same initial clustermeta state and has effectively "forgotten" that it needs to run reloadnodes. This new retry loop adds a level of error handling inside the monitor_cluster_meta_changes function so that we will retry with a bounded number of attempts.

Add a loop that tries the `ctdb reloadnodes` command after an increasing delay. This is an attempt to fix a condition where ctdbd is apparently not ready to handle the `ctdb reloadnodes` command. In this case the command would be run, but fail and an exception would be raised in the monitor_cluster_meta_changes function would raise an exception. This would be caught by the command-level retry loop. However, this command-level retry loop will simply re-run monitor_cluster_meta_changes and this function now no longer has the same initial clustermeta state and has effectively "forgotten" that it needs to run reloadnodes. This new retry loop adds a level of error handling inside the monitor_cluster_meta_changes function so that we will retry with a bounded number of attempts. Signed-off-by: John Mulligan <jmulligan@redhat.com>

synarete

LGTM. See minor comment.

synarete · 2024-08-18T06:29:06Z

sambacc/ctdb.py

+    tries: int = 5,
+) -> None:
+    for idx in range(tries):
+        time.sleep(1 << idx)


Minor: have upper bound to sleep time. Something like time.sleep(min(1 << idx, 60)). Just in case someone naively increases tries to large value.

anoopcs9

lgtm.

phlogistonjohn force-pushed the jjm-ctdb-reload-err-handle branch from eb4632f to c5095b4 Compare August 16, 2024 12:51

phlogistonjohn marked this pull request as ready for review August 16, 2024 16:29

phlogistonjohn requested review from synarete and anoopcs9 August 16, 2024 16:29

synarete approved these changes Aug 18, 2024

View reviewed changes

anoopcs9 approved these changes Aug 19, 2024

View reviewed changes

mergify bot merged commit 1b72854 into samba-in-kubernetes:master Aug 19, 2024
9 checks passed

phlogistonjohn deleted the jjm-ctdb-reload-err-handle branch August 20, 2024 21:12

mergify bot added the priority-review label Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sambacc: add a retry loop to ctdb.monitor_cluster_meta_changes #130

sambacc: add a retry loop to ctdb.monitor_cluster_meta_changes #130

phlogistonjohn commented Aug 16, 2024

synarete left a comment

synarete Aug 18, 2024

anoopcs9 left a comment

sambacc: add a retry loop to ctdb.monitor_cluster_meta_changes #130

sambacc: add a retry loop to ctdb.monitor_cluster_meta_changes #130

Conversation

phlogistonjohn commented Aug 16, 2024

synarete left a comment

Choose a reason for hiding this comment

synarete Aug 18, 2024

Choose a reason for hiding this comment

anoopcs9 left a comment

Choose a reason for hiding this comment