Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

criu checkpoint/restore: print errors from criu log #3816

Merged
merged 4 commits into from
Aug 5, 2023

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Apr 6, 2023

When criu fails, it does not give us much context to understand what was the cause of an error -- for that, we need to take a look into its log file.

This is somewhat complicated to do (as you can see in parts of checkpoint.bats removed by this commit), and not very convenient.

Add a function to find and log errors from criu logs, in case either checkpoint or restore has failed.

Fixes: #3711

@kolyshkin
Copy link
Contributor Author

@adrianreber @avagin PTAL

@adrianreber
Copy link
Contributor

That is good idea. A bit wild maybe. When using CRIU in Podman or CRI-O users usually get the error message from runc and pass it to the user in combination with the location of the log file.

I guess it would be important to know if users of runc's checkpoint functionality (like containerd, Podman and CRI-O) can handle this changed runc behaviour.

Your log scanner only seems to run during a failure so the added time to scan the log file should normally not be a problem.

I like this idea of better error reporting to the user but I am not sure how well runc's user (container engines) can handle multiline error messages.

@kolyshkin
Copy link
Contributor Author

I like this idea of better error reporting to the user but I am not sure how well runc's user (container engines) can handle multiline error messages.

I've seen quite a few cases when runc emits multiple errors/warnings etc., so this should not be something that's entirely new. Surely, we'll have 1.2.0rc released to test this.

@avagin
Copy link
Contributor

avagin commented Apr 24, 2023

LGTM

@kolyshkin kolyshkin force-pushed the show-criu-errors branch 3 times, most recently from 7579873 to 15c1c60 Compare August 1, 2023 17:32
Copy link
Contributor

@rst0git rst0git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

No code change, only added periods to some comments to make godot happy.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Use "switch t" since we only check t.

2. Remove unneeded t assignment.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
When criu fails, it does not give us much context to understand what
was the cause of an error -- for that, we need to take a look into its
log file.

This is somewhat complicated to do (as you can see in parts of
checkpoint.bats removed by this commit), and not very user-friendly.

Add a function to find and log errors from criu logs, together with some
preceding context, in case either checkpoint or restore has failed.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
As we now log the log file name in logCriuErrors.

While at it, there is no need to use var.String() with %s as it is done
by the runtime.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@avagin
Copy link
Contributor

avagin commented Aug 3, 2023

LGTM

@kolyshkin
Copy link
Contributor Author

@adrianreber @AkihiroSuda @cyphar PTAL

Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Seems a little bit crazy but I guess it's better to get some information in the actual runc log rather than requiring the user to go looking themselves. We will have to see if this breaks anything in 1.2-rc1.

@kolyshkin
Copy link
Contributor Author

LGTM. Seems a little bit crazy but I guess it's better to get some information in the actual runc log rather than requiring the user to go looking themselves. We will have to see if this breaks anything in 1.2-rc1.

The alternative is to make criu report extended errors. This is somewhat complicated because when debugging criu some context is usually needed (to see what happened just before the error) and grep -B N log naturally provides this context. Nevertheless, maybe something better will replace this in, alas not yet, so for now this PR is (way) better than spewing the log file name.

@kolyshkin
Copy link
Contributor Author

@lifubang PTAL

@lifubang lifubang merged commit 7b1fc98 into opencontainers:main Aug 5, 2023
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement c/r error reporting
6 participants