Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure resizer handles double processing of PVCs without problems. #4

Closed
gnufied opened this issue Jan 7, 2019 · 9 comments
Closed

Comments

@gnufied
Copy link
Contributor

gnufied commented Jan 7, 2019

xref - kubernetes/kubernetes#71611

@mlmhl
Copy link
Contributor

mlmhl commented Jan 8, 2019

/assign

@k8s-ci-robot
Copy link
Contributor

@mlmhl: GitHub didn't allow me to assign the following users: mlmhl.

Note that only kubernetes-csi members and repo collaborators can be assigned.
For more information please see the contributor guide

In response to this:

/assign

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mlmhl
Copy link
Contributor

mlmhl commented Jan 8, 2019

It seems that the race condition mentioned in kubernetes/kubernetes#71470 (comment) does not exist in external-resizer. The in-tree expand controller uses a custom sync mechanism to resync pvcs, which is the root of the problem. But in external-resizer, we uses SharedInformer's resync mechanism. With this mechanism, SharedInformer will send us Update events periodically, the old object and new object are the same in these events. So this judgement inside the update handler will intercept these events.

But when I review the code, I found that this mechanism raises another race condition, consider the following example:

  1. User requests size 1Gi -> 2Gi
  2. Resizer resizes the volume and adds a PersistentVolumeClaimFileSystemResizePending condition
  3. Kubelet starts to resize file system
  4. User requests size 2Gi -> 3Gi
  5. Resizer resizes the volume and updates the PersistentVolumeClaimFileSystemResizePending condition.
  6. Kubelet finishes the first resize request and clear the conditions.
  7. Kubelet performs the second resize request.

Although second resize request can be finished in step 7, but the PVC condition is not right during step 6 and step 7(wanted PersistentVolumeClaimFileSystemResizePending, got empty), because we won't process the PVC again(due to the judgement mentioned above).

To resolve this problem, I think we need to remove the judgement mentioned above and add a version check mechanism like kubernetes/kubernetes#71611. @gnufied Do you have any other good ideas?

@gnufied
Copy link
Contributor Author

gnufied commented Jan 9, 2019

Yeah I agree, I think we do need a resourceVersion check added in kubernetes/kubernetes#71611 to external resizer as well.

pohly pushed a commit to pohly/external-resizer that referenced this issue Mar 4, 2019
build.make: fix pushing of "canary" image from master branch
pohly pushed a commit to pohly/external-resizer that referenced this issue Mar 6, 2019
build.make: fix pushing of "canary" image from master branch
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 28, 2019
@mlmhl
Copy link
Contributor

mlmhl commented Apr 30, 2019

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 30, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 29, 2019
@mlmhl
Copy link
Contributor

mlmhl commented Jul 30, 2019

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 30, 2019
@mlmhl
Copy link
Contributor

mlmhl commented Jul 30, 2019

@gnufied As the PR #6 is already merged, could we close this issue?

@gnufied gnufied closed this as completed Jul 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

4 participants