diff --git a/keps/prod-readiness/sig-node/2008.yaml b/keps/prod-readiness/sig-node/2008.yaml index 247d06c16ce8..f1e4256bf9de 100644 --- a/keps/prod-readiness/sig-node/2008.yaml +++ b/keps/prod-readiness/sig-node/2008.yaml @@ -1,3 +1,5 @@ kep-number: 2008 alpha: approver: "@ehashman" +beta: + approver: "@deads2k" diff --git a/keps/sig-node/2008-forensic-container-checkpointing/README.md b/keps/sig-node/2008-forensic-container-checkpointing/README.md index 723b3e5969a6..22023d496e77 100644 --- a/keps/sig-node/2008-forensic-container-checkpointing/README.md +++ b/keps/sig-node/2008-forensic-container-checkpointing/README.md @@ -125,6 +125,10 @@ message CheckpointContainerRequest { string container_id = 1; // Location of the checkpoint archive used for export/import string location = 2; + // Timeout in seconds for the checkpoint to complete. + // Timeout of zero means to use the CRI default. + // Timeout > 0 means to use the user specified timeout. + int64 timeout = 3; } message CheckpointContainerResponse {} @@ -146,6 +150,16 @@ In its first implementation the risks are low as it tries to be a CRI API change with minimal changes to the kubelet and it is gated by the feature gate `ContainerCheckpoint`. +One possible risk that was identified during Alpha is that the disk of +the node requesting the checkpoints could fill up if too many checkpoints +are created. One approach to solve this was some kind of garbage collection +of checkpoint archives. A pull request to implement garbage collection +was opened ([#115888](https://github.com/kubernetes/kubernetes/pull/115888)) +but during review it became clear that the kubelet might not be the right +place to implement checkpoint archive garbage collection and the pull request +was closed again. Currently the most likely solution seems to be to implement +the garbage collection in an operator. + ## Design Details The feature gate `ContainerCheckpoint` will ensure that the API @@ -244,13 +258,29 @@ We expect no non-infra related flakes in the last month as a GA graduation crite Once CRI implementation provide the relevant RPC calls the e2e tests will not fail but need to be extended. +- Once the initial Alpha release CRI-O supports the + `CheckpointContainer` CRI RPC and tests have been + enhanced to support CRI implementation that implement + the `CheckpointContainer` CRI RPC + +- Once Kubernetes was released with the `CheckpointContainer` CRI RPC + CRI-O has been updated to support the new CRI RPC. + The tests have been enhanced to work with CRI implementations + that support the `CheckpointContainer` CRI RPC as well as + CRI implementations that do not support it. The tests also handle + if the corresponding feature gate is disabled or enabled: + + ### Graduation Criteria #### Alpha -- [ ] Implement the new feature gate and kubelet implementation -- [ ] Ensure proper tests are in place -- [ ] Update documentation to make the feature visible +- [X] Implement the new feature gate and kubelet implementation +- [X] Ensure proper tests are in place +- [X] Update documentation to make the feature visible + - + - + - #### Alpha to Beta Graduation @@ -350,6 +380,7 @@ does not compress the checkpoint archive on disk. * 2022-01-20: Reworked based on review and renamed feature gate to `ContainerCheckpoint` * 2022-04-05: Added CRI API section and targeted 1.25 * 2022-05-17: Remove *restore* RPC from the CRI API +* 2023-10-09: Beta graduation in 1.30 ## Drawbacks diff --git a/keps/sig-node/2008-forensic-container-checkpointing/kep.yaml b/keps/sig-node/2008-forensic-container-checkpointing/kep.yaml index b75942e82c88..e40ec32dfd0c 100644 --- a/keps/sig-node/2008-forensic-container-checkpointing/kep.yaml +++ b/keps/sig-node/2008-forensic-container-checkpointing/kep.yaml @@ -15,18 +15,18 @@ approvers: - "@dchen1107" # The target maturity stage in the current dev cycle for this KEP. -stage: alpha +stage: beta # The most recent milestone for which work toward delivery of this KEP has been # done. This can be the current (upcoming) milestone, if it is being actively # worked on. -latest-milestone: "v1.25" +latest-milestone: "v1.30" # The milestone at which this feature was, or is targeted to be, at each stage. milestone: alpha: "v1.25" - beta: "v1.26" - stable: "v1.28" + beta: "v1.30" + stable: "v1.33" # The following PRR answers are required at alpha release # List the feature gate name and the components for which it must be enabled