driver: add a bootstrap timeout #2126

jsternberg · 2023-11-16T21:32:07Z

Drivers will now fail with a timeout if they take over 20 seconds to
boot. This process repeats itself 3 times for a total of a minute.

Fixes #2120.

Drivers will now fail with a timeout if they take over 20 seconds to boot. This process repeats itself 3 times for a total of a minute. Signed-off-by: Jonathan A. Sternberg <jonathan.sternberg@docker.com>

tonistiigi

I don't think we should set a general limit for boot. Eg. if you are on slow connection maybe pulling buildkit image takes a lot of time, or who knows what may affect the boot time for k8s driver. I think we should concentrate on the initial report of detecting unresponsive remote server or server that is not buildkit at all.

jsternberg · 2023-11-17T17:59:28Z

We can put the cancellation on the Wait() call instead and make it driver specific so this would only apply to the remote. It might also be better to include it as an option to the driver instead of just a default parameter. That way, we're also not changing the existing behavior.

The main problem with this that we run into is that GRPC doesn't really tell us if the GRPC service is unavailable in a permanent or temporary way. We'd have to parse the error message explicitly. GRPC's reconnection mechanism also seems to only be for reconnections and doesn't apply to the initial connection.

Another option that might help might be to include a status message for the bootstrap process. Right now, it just kind of stalls and doesn't show any message about the progress of the bootstrap. It might help if we use the progress writer to at least say how long the process is taking.

tonistiigi · 2023-11-17T20:39:50Z

We can put the cancellation on the Wait() call instead and make it driver specific so this would only apply to the remote.

If you mean the client.Wait() in boot then that seems fine.

It might help if we use the progress writer to at least say how long the process is taking.

That would mean collecting such messages from the grpc level I think. I'm not sure how possible it is to capture that, and keep it in the form that it is useful for the user.

jsternberg · 2023-11-17T21:40:28Z

I've created a PR for an alternative to this one: #2130

If the service is already up, it doesn't change the output at all. If the service is invalid like in the original PR, it looks like this:

[+] Building 20.0s (1/1) FINISHED
 => ERROR [internal] waiting for connection                                                                                 20.0s
------
 > [internal] waiting for connection:
------
ERROR: context deadline exceeded

For ctrl+c:

[+] Building 0.9s (1/1) FINISHED
 => CANCELED [internal] waiting for connection                                                                               0.9s
ERROR: context canceled

jsternberg · 2023-11-17T23:00:35Z

Closing this in favor of #2130.

driver: add a bootstrap timeout

9c6ca22

Drivers will now fail with a timeout if they take over 20 seconds to boot. This process repeats itself 3 times for a total of a minute. Signed-off-by: Jonathan A. Sternberg <jonathan.sternberg@docker.com>

tonistiigi reviewed Nov 16, 2023

View reviewed changes

jsternberg closed this Nov 17, 2023

jsternberg deleted the bootstrap-timeout branch November 17, 2023 23:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

driver: add a bootstrap timeout #2126

driver: add a bootstrap timeout #2126

jsternberg commented Nov 16, 2023

tonistiigi left a comment

jsternberg commented Nov 17, 2023

tonistiigi commented Nov 17, 2023

jsternberg commented Nov 17, 2023

jsternberg commented Nov 17, 2023

driver: add a bootstrap timeout #2126

driver: add a bootstrap timeout #2126

Conversation

jsternberg commented Nov 16, 2023

tonistiigi left a comment

Choose a reason for hiding this comment

jsternberg commented Nov 17, 2023

tonistiigi commented Nov 17, 2023

jsternberg commented Nov 17, 2023

jsternberg commented Nov 17, 2023