Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patchset of chrony and NTP related fixes #4016

Merged
merged 3 commits into from
Jun 29, 2024

Conversation

rouming
Copy link
Contributor

@rouming rouming commented Jun 28, 2024

This PR contains three patches, addressing issues related to NTP and chrony.

  1. Use waitsync command for initial NTP sync: The first patch also modifies the device-steps.sh script, replacing the chronyd -q command with the waitsync command from the chronyc client for the initial NTP synchronization. This change simplifies the process by avoiding the need to parse the configuration file.

  2. Reload NTP sources without restarting daemon: The second patch updates the device-steps.sh script to reload NTP sources directly, instead of restarting the chronyd daemon. This change avoids issues with the watchdog detecting the absence of the chronyd process and firing incorrectly. The patch includes a loop to delete old sources and add new ones, due to the lack of a single command in chronyc to clear all sources. This should help to fix the chronyd.pid watchdog on recent master #4014

  3. Remove Unix socket on error path: The third patch modifies the handlentp.go to ensure that the Unix socket file is deleted if net.DialUnix() encounters an error, preventing "address already in use" errors.

…nyc cmd

There is a convenient `waitsync` command from chronyc client,
which block until first time sync happens. This can be used
instead of `chronyd -q` client mode, which is cumbersome to use
due to the configuration file parsing.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
When NTP sources are updated, chronyd needs to be notified. The easiest
way was to kill the chronyd instance and restart it with new NTP sources.
This restart seems is not very much appreciated by the watchdog, which
detects absense of the process by the pidfile and fires (see the lf-edge#4014)

I could not reproduce the issue locally, but I see the following logs
quite often:

   2024-06-28T13:23:28.680584031Z;watchdog;watchdog: pidfile: /run/chronyd.pid
   2024-06-28T13:23:43.234635519Z;pillar.out;2024-06-28T13:23:43,207263478+00:00 NTP (chronyd) server 3155 still running
   2024-06-28T13:23:43.774360721Z;watchdog;watchdog: cannot open /run/chronyd.pid (errno = 2 = 'No such file or directory')
   2024-06-28T13:23:44.77729473Z;watchdog;watchdog: cannot open /run/chronyd.pid (errno = 2 = 'No such file or directory')
   2024-06-28T13:23:45.77819751Z;watchdog;watchdog: cannot open /run/chronyd.pid (errno = 2 = 'No such file or directory')

which happen during the chronyd restart. I do not have an answer why
restart could happen so long, but anyway this is not a good approach
to restart a daemon for new NTP sources population.

This patch deletes old sources and populates new list. Unfortunately
chronyc does not provide a way to delete all sources in one line,
so this has to be implemented in a loop.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
net.DialUnix() leaves sock file created on error path and
does not delete it, this leads to the following errors:

   getNTPSourcesInfo: can't connect to chronyd: dial unixgram /run/chrony/chronyc.1506.sock /run/chrony/chronyd.sock: bind: address already in use

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try it

@rouming rouming added the bug Something isn't working label Jun 28, 2024
@eriknordmark
Copy link
Contributor

I don't see any chronyd.pid watchdogs during 24 hours of runs with these fixes.

@eriknordmark eriknordmark merged commit 0c0cc7e into lf-edge:master Jun 29, 2024
31 of 32 checks passed
@rouming rouming deleted the chronyc-fixes branch July 1, 2024 05:18
@eriknordmark
Copy link
Contributor

Fixed issue #4014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants