Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding restart kubelet flag on e2e test #97028

Merged
merged 1 commit into from
Jun 23, 2021

Conversation

knabben
Copy link
Member

@knabben knabben commented Dec 3, 2020

What type of PR is this?
/kind feature

What this PR does / why we need it:
Adding a --restart-kubelet flag on Node E2E suite, the flag can be used to control the restart/not of Kubelet when it is killed.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:
Kubelet --exit-on-lock-contention will exit Kubelet when a new lock file is created, this flag is used in the suite to not bring this up via systemctl.

Does this PR introduce a user-facing change?:

Adding `--restart-kubelet` flag on E2E Node test suite

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework labels Dec 3, 2020
@k8s-ci-robot k8s-ci-robot added area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Dec 3, 2020
@knabben
Copy link
Member Author

knabben commented Dec 3, 2020

/test pull-kubernetes-e2e-gce-ubuntu-containerd

@@ -221,6 +221,8 @@ type NodeTestContextType struct {
// the node e2e test. If empty, the default one (system.DefaultSpec) is
// used. The system specs are in test/e2e_node/system/specs/.
SystemSpecName string
// RestartKubelet restarts Kubelet unit when the process is killed.
RestartKubelet bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we should make the framework test context aware of such an option.
in fact a lot of options should be trimmed out of it if we eventually want to treat it as an API.

/cc @oomichi @alejandrox1

Copy link
Member Author

@knabben knabben Dec 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neolit123 As you stated, this still can be applied to other options, only used on Node E2E scope, inside the NodeTestContextType and even TestContextType directly. I don't see any other immediate place (used in the Node E2E suite) to set up and share the variable without reading the flag on init(), any suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given a number of flags already use the framework context for storage, this one can go there as well.

flags.BoolVar(&framework.TestContext.NodeConformance, "conformance", false, "If true, the test suite will not start kubelet, and fetch system log (kernel, docker, kubelet log etc.) to the report directory.")
flags.BoolVar(&framework.TestContext.PrepullImages, "prepull-images", true, "If true, prepull images so image pull failures do not cause test failures.")
flags.BoolVar(&framework.TestContext.RestartKubelet, "restart-kubelet", true, "If true, restart Kubelet unit when the process is killed.")
flags.StringVar(&framework.TestContext.ImageDescription, "image-description", "", "The description of the image which the test will be running on.")
flags.StringVar(&framework.TestContext.SystemSpecName, "system-spec-name", "", "The name of the system spec (e.g., gke) that's used in the node e2e test. The system specs are in test/e2e_node/system/specs/. This is used by the test framework to determine which tests to run for validating the system requirements.")
flags.Var(cliflag.NewMapStringString(&framework.TestContext.ExtraEnvs), "extra-envs", "The extra environment variables needed for node e2e tests. Format: a list of key=value pairs, e.g., env1=val1,env2=val2")
flags.StringVar(&framework.TestContext.SriovdpConfigMapFile, "sriovdp-configmap-file", "", "The name of the SRIOV device plugin Config Map to load.")

but long term this should be decoupled.
ideally the PR should pass review by SIG Node / e2e_node owners.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the ln#49-76,

TestContextType contains test settings and global state. It is due to historical reasons that it is a mixture of items managed by the test framework itself, cloud providers and individual tests, the end goal is to move anything not required by the framework into the code which uses the settings.

The recommendation for those settings is:

  • They are stored in their own context structure or local variables. (In test/e2e_node/e2e_node_suite_test.go as suggested)
  • The standard flag package is used to register them.
  • The flag name should follow the pattern <part1>.<part2>....<partn> where the prefix is unlikely to conflict with other tests or standard packages and each part is in lower camel case. (eg: node.kubelet.restartOnExit ?)

WDYT?

Copy link
Member Author

@knabben knabben May 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SergeyKanzhelev this is an open discussion, do reusing the TestContext for node tests a topic to be reviewed? Should we start on this PR?

I think @alejandrox1 started a document for this refactoring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the suggested refactoring, but I agree it is outside the scope of this PR. I think @neolit123 agreed with this. @ike-ma any concerns with making refactoring separately?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SergeyKanzhelev this is an open discussion, do reusing the TestContext for node tests to be reviewed, starting on this PR?

I think @alejandrox1 started a document for this refactoring.

@alejandrox1 @knabben Do we have an issue for the refactoring?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -80,6 +80,7 @@ func registerNodeFlags(flags *flag.FlagSet) {
// It is hard and unnecessary to deal with the complexity inside the test suite.
flags.BoolVar(&framework.TestContext.NodeConformance, "conformance", false, "If true, the test suite will not start kubelet, and fetch system log (kernel, docker, kubelet log etc.) to the report directory.")
flags.BoolVar(&framework.TestContext.PrepullImages, "prepull-images", true, "If true, prepull images so image pull failures do not cause test failures.")
flags.BoolVar(&framework.TestContext.RestartKubelet, "restart-kubelet", true, "If true, restart Kubelet unit when the process is killed.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems fine as long as sig node are OK with the change.
@kubernetes/sig-node-pr-reviews

Adding --restart-kubelet flag on E2E Node test suite

please explain what the flag does in the release note too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the flag was already approved here:
#96775 (review)

@@ -302,6 +302,7 @@ func (e *E2EServices) startKubelet() (*server, error) {
}

cmd := exec.Command(cmdArgs[0], cmdArgs[1:]...)
restartOnExit := framework.TestContext.RestartKubelet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can another variable be used to store the flag state instead of framework.TestContext.RestartKubelet ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the flag from the file init is one option, I'm not sure if is a pattern we want to propagate here:

func init() {
	flag.Var(&kubeletArgs, "kubelet-flags", "Kubelet flags passed to kubelet, this will override default kubelet flags in the test. Flags specified in multiple kubelet-flags will be concatenate.")
	flag.BoolVar(&genKubeletConfigFile, "generate-kubelet-config-file", true, "The test runner will generate a Kubelet config file containing test defaults instead of passing default flags to the Kubelet.")
 	flag.BoolVar(&restartKubelet, "restart-kubelet", true, "If true, restart Kubelet unit when the process is killed.")

}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be the recommended way to do init() here and use a local variable for the restart-kubelet flag based on test/e2e/framework/test_context.go ln#49-76?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is to decouple specific node flags from the main e2e framework context. But adding these flags on the init() here without further context it's only augmenting the mess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neolit123 we opened an issue to move this TestContext.RestartKubelet to a proper place, for this specific PR, we are keeping in the global one, wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM.

@knabben
Copy link
Member Author

knabben commented Dec 21, 2020

/retest

@knabben knabben closed this Feb 20, 2021
@knabben
Copy link
Member Author

knabben commented Mar 8, 2021

/reopen

@k8s-ci-robot k8s-ci-robot removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 17, 2021
@SergeyKanzhelev
Copy link
Member

/assign @ike-ma

@k8s-ci-robot
Copy link
Contributor

@SergeyKanzhelev: GitHub didn't allow me to assign the following users: ike-ma.

Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @ike-ma

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rata
Copy link
Member

rata commented Apr 20, 2021

Friendly ping? :)

@ike-ma
Copy link
Contributor

ike-ma commented Apr 20, 2021

Ack. Review in progress.

@SergeyKanzhelev
Copy link
Member

/test pull-kubernetes-node-e2e-containerd

What's the current status of this PR?

@SergeyKanzhelev
Copy link
Member

/assign @cynepco3hahue

to help @ike-ma to review

@cynepco3hahue
Copy link

I currently do not see a lot of issues adding the flag under the TestContextType because the flag affects the test infrastructure behavior and does not affect the kubelet configuration. But it can be nice to add TODO and issue number that will point to the fact that we want to have a separate context for e2e node variables.

@cynepco3hahue
Copy link

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 18, 2021
@oomichi
Copy link
Member

oomichi commented Jun 22, 2021

Sorry for my late response.
This PR doesn't change the test behavior by default and adds a flag for restarting.
That seems straightforward for me.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: knabben, oomichi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 22, 2021
@knabben
Copy link
Member Author

knabben commented Jun 23, 2021

/test pull-kubernetes-e2e-gce-ubuntu-containerd

@k8s-ci-robot
Copy link
Contributor

@knabben: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce-ubuntu-containerd 00da68d link /test pull-kubernetes-e2e-gce-ubuntu-containerd

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot merged commit af60beb into kubernetes:master Jun 23, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Jun 23, 2021
ipochi added a commit to ipochi/test-infra that referenced this pull request Jul 9, 2021
This flag was added to the test suite in
kubernetes/kubernetes#97028

However as it turns out, in order to test the exit on lock contention,
it is not needed, hence removing the flag from the test job.

Signed-off-by: Imran Pochi <imran@kinvolk.io>
ipochi added a commit to ipochi/test-infra that referenced this pull request Aug 11, 2021
This flag was added to the test suite in
kubernetes/kubernetes#97028

However as it turns out, in order to test the exit on lock contention,
it is not needed, hence removing the flag from the test job.

Signed-off-by: Imran Pochi <imran@kinvolk.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Archived in project
Development

Successfully merging this pull request may close these issues.

9 participants