-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
criu checkpoint/restore: print errors from criu log #3816
Conversation
@adrianreber @avagin PTAL |
10de072
to
840df2a
Compare
840df2a
to
a10b9ca
Compare
That is good idea. A bit wild maybe. When using CRIU in Podman or CRI-O users usually get the error message from runc and pass it to the user in combination with the location of the log file. I guess it would be important to know if users of runc's checkpoint functionality (like containerd, Podman and CRI-O) can handle this changed runc behaviour. Your log scanner only seems to run during a failure so the added time to scan the log file should normally not be a problem. I like this idea of better error reporting to the user but I am not sure how well runc's user (container engines) can handle multiline error messages. |
I've seen quite a few cases when runc emits multiple errors/warnings etc., so this should not be something that's entirely new. Surely, we'll have 1.2.0rc released to test this. |
LGTM |
7579873
to
15c1c60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
15c1c60
to
dffd0b4
Compare
No code change, only added periods to some comments to make godot happy. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Use "switch t" since we only check t. 2. Remove unneeded t assignment. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
When criu fails, it does not give us much context to understand what was the cause of an error -- for that, we need to take a look into its log file. This is somewhat complicated to do (as you can see in parts of checkpoint.bats removed by this commit), and not very user-friendly. Add a function to find and log errors from criu logs, together with some preceding context, in case either checkpoint or restore has failed. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
As we now log the log file name in logCriuErrors. While at it, there is no need to use var.String() with %s as it is done by the runtime. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
dffd0b4
to
3867693
Compare
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Seems a little bit crazy but I guess it's better to get some information in the actual runc log rather than requiring the user to go looking themselves. We will have to see if this breaks anything in 1.2-rc1.
The alternative is to make criu report extended errors. This is somewhat complicated because when debugging criu some context is usually needed (to see what happened just before the error) and |
@lifubang PTAL |
When criu fails, it does not give us much context to understand what was the cause of an error -- for that, we need to take a look into its log file.
This is somewhat complicated to do (as you can see in parts of checkpoint.bats removed by this commit), and not very convenient.
Add a function to find and log errors from criu logs, in case either checkpoint or restore has failed.
Fixes: #3711