Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kubevirt hypervisor #3841

Merged
merged 1 commit into from
Apr 26, 2024

Conversation

zedi-pramodh
Copy link

@zedi-pramodh zedi-pramodh commented Apr 3, 2024

  1. Add Kubevirt hypervisor type in hypervisor.go
  2. Since both kvm and kubevirt use /dev/kvm, we will filter the enabled hypervisor by flag isHVTypeKube
  3. kubevirt.go implements the hypervisor interface for kubevirt hypervisor
  4. Support for VMs, containers as a VM and also launch containers as kubernetes pods if virtualization type NOHYPER is chosen.
  5. hostdevices PCIe passthrough is supported for NVMe disks, USB and Network interfaces.
  6. Add hvTypeKube to domainmgr context
  7. Domainmgr waits until kubernetes is ready for kubevirt type, cleans up any stale VM instances during startup.
  8. Metrics collection uses kubernetes metrics collection mechanism for both VMs and containers (pods), but uses the existing domainmetrics structures to be consistent with other flavors of eve
  9. Added 3 more states in the types.go PENDING, SCHEDULING and FAILED.

NOTE: This PR contains code written by @naiming-zededa too. Especially pod specific and also the metrics collection part.

@zedi-pramodh zedi-pramodh changed the title Add kubevirt hyperisor Add kubevirt hypervisor Apr 3, 2024
Copy link
Contributor

@deitch deitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly several questions to deepen understanding, a suggestion or two.

pkg/pillar/hypervisor/hypervisor.go Outdated Show resolved Hide resolved
pkg/pillar/hypervisor/hypervisor.go Show resolved Hide resolved
pkg/pillar/hypervisor/nokube.go Show resolved Hide resolved
pkg/pillar/hypervisor/pci.go Show resolved Hide resolved
pkg/pillar/kubeapi/kubeapi.go Show resolved Hide resolved
pkg/pillar/hypervisor/kubevirt.go Show resolved Hide resolved
var i int
var status string
var err error
for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to check 6 times? It sounds arbitrary.

Moreover why is an endless loop needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we should put some comments around, and maybe defined a const for the total wait time on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx. That sounds good.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naiming-zededa your response needed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zedi-pramodh can you define a const
const WaitForPodCheckCounter 5 // we check each 15 seconds, don't wait for too long to cause watchdog

and also before the 'for {' add comment:
// wait for pod to be in running state, sometimes can take long, but we only wait for
// about a minute in order not to cause watchdog action

and also in 'if i > 5 {', change to use the const

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. will be in next commit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we shouldn't have to worry about watchdogs here. See other comments.

@zedi-pramodh zedi-pramodh force-pushed the add-kubevirt-hyperisor branch 2 times, most recently from ddf251d to eced2a3 Compare April 15, 2024 19:05
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a latent log.Fatal in here, plus a number of other suggestions from the reviewers.

@zedi-pramodh
Copy link
Author

I agree with you @christoph-zededa but for these PRs all test cases are same as existing test cases. Because nothing changes from EVE API perspective. But when we go to clustering part then for sure we need to get some tests incorporated.

@@ -43,14 +43,15 @@ type hypervisorDesc struct {
var knownHypervisors = map[string]hypervisorDesc{
XenHypervisorName: {constructor: newXen, dom0handle: "/proc/xen", hvTypeFileContent: "xen"},
KVMHypervisorName: {constructor: newKvm, dom0handle: "/dev/kvm", hvTypeFileContent: "kvm"},
KubevirtHypervisorName: {constructor: newKubevirt, dom0handle: "/dev/kvm", hvTypeFileContent: "kubevirt"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a lot easier to read and understand this if we replace the dom0handle string with an enabled func. For the existing one that func would Stat a file and for kubervirt it would be base.IsHVTypeKube()

The functions can even be inline as in
constructor: newXen, enabled: func() bool { _, err := os.Stat("/proc/xen") return err == nil}, hvTypeFileContent: "xen"}

@@ -93,9 +94,18 @@ func BootTimeHypervisor() Hypervisor {
// the advice of this function and always ask for the enabled one.
func GetAvailableHypervisors() (all []string, enabled []string) {
all = hypervisorPriority
isHVTypeKube := base.IsHVTypeKube()
for _, v := range all {
if _, err := os.Stat(knownHypervisors[v].dom0handle); err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the above suggestion this would become
if knownHypervisors[v].enabled() {
enabled = append(enabled, v)
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will submit a new PR for these suggestions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3892 submitted

@zedi-pramodh
Copy link
Author

Updated 2 more commits, one for all vendor files coming after bumping the eve-api version and other addressing all review comments until 04/23

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments

pkg/pillar/hypervisor/kubevirt.go Outdated Show resolved Hide resolved
}

// Create the VM
i := 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Domainmgr is more patient that that.
It has a separate goroutine for each DomainConfig which handles the create, modify, and delete operations for that task.
Only the main loop in domainmgr has the watchdog hook.

You can see that in the delete/inactivate path where domainmgr can wait for minutes for a VM to halt.

So if there are no reasons for you to not wait forever here for that reason.
HOWEVER, domainmgr has its only logic for retry starting a task if it failed to boot. That uses a configurable timer (maybe 60 seconds by default??).

So question is what type of failures do we see which made you add a retry here, and whether we can simplify this and use the existing retry in domainmgr?

Comment on lines +1060 to +1057
// wait for pod to be in running state, sometimes can take long, but we only wait for
// about a minute in order not to cause watchdog action
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above. I assuming it is only the create and modify functions in domainmgr which call this, so it can wait forever (but you might want to check for any errors while waiting forever.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIR, we retry here to make sure kubernetes is up and running after reboot, it could take more than 2 minutes or so. The number of retries from domainmgr may not be sufficient, we do not want to generically bump up retries in domainmgr.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we can retry forever as long as error is not kubernetes not ready.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively if the issue is about k3s not up after a reboot, is there a way we can have e.g., domainmgr wait for it to be fully up and running? Is there a way to check whether it is ready?

If that is the case, then we can return all failures to the caller (and domainmgr will retry the boot after some time using the generic code).

Copy link
Author

@zedi-pramodh zedi-pramodh Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry it's been longtime since we wrote this code. We are looping for pod to be ready not k3s. For k3s ready we already do that in domainmgr when it starts. We call kubeapi.WaitForKubernetes() that should ensure k3s is ready.

This additional loop check seems to be to make sure when we launched a VM it reached a ready status. It generally takes sometime for kubernetes to schedule the VMI and start it. @naiming-zededa can you respond to this question.

@eriknordmark
Copy link
Contributor

Note that the go tests failed with
=== Failed
=== FAIL: hypervisor TestGetAvailableHypervisors (0.00s)
hypervisor_test.go:32: wrong list of available hypervisors: ["xen" "kvm" "kubevirt" "acrn" "containerd" "null"] vs. ["xen" "kvm" "acrn" "containerd" "null"]

@zedi-pramodh
Copy link
Author

Ah looks like I need to fix that test file to include kubevirt.

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the build works we should run this through the tests.
And before merging please squash the commits into a set which makes sense for the future - one commit to update the API and a second one with all of the code might make sense.

@zedi-pramodh
Copy link
Author

Sure I will squash all commits once I submit one more commit for the latest review comments.

@eriknordmark
Copy link
Contributor

Apparently there are also DCO issues - see https://github.com/lf-edge/eve/pull/3841/checks?check_run_id=24229268647

@eriknordmark
Copy link
Contributor

The build appears to be failing due to
#0 0.059 /newlog/go.mod:5: unknown directive: toolchain

@zedi-pramodh
Copy link
Author

Apparently there are also DCO issues - see https://github.com/lf-edge/eve/pull/3841/checks?check_run_id=24229268647

DCO issues seems to be coming from using the option commit the suggestion. I think I will git squash everything to one commit. That should fix it

@zedi-pramodh
Copy link
Author

The build appears to be failing due to #0 0.059 /newlog/go.mod:5: unknown directive: toolchain

No clue what this one is. May be came after bumping to new eve-api

    1) Add Kubevirt hypervisor type in hypervisor.go
    2) Since both kvm and kubevirt use /dev/kvm, we will filter the enabled hypervisor by flag isHVTypeKube
    3) kubevirt.go implements the hypervisor interface for kubevirt hypervisor
    4) Support for VMs, containers as a VM and also launch containers as kubernetes pods if virtualization type NOHYPER is chosen.
    5) hostdevices PCIe passthrough is supported for NVMe disks, USB and Network interfaces.
    5) Add hvTypeKube to domainmgr context
    6) Domainmgr waits until kubernetes is ready for kubevirt type, cleans up any stale VM instances during startup.
    7) Metrics collection uses kubernetes metrics collection mechanism for both VMs and containers (pods), but uses the existing domainmetrics
       structures to be inconsistent with other flavors of eve
    8) Added 3 more states in the types.go PENDING, SCHEDULING and FAILED.

    NOTE: This PR contains code written by Naiming Shen too. Especially pod specific and also the metrics collection part.

    9) This commit is squash of all commits after addressing review comments

Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>
@zedi-pramodh
Copy link
Author

git squashed to single commit and fixed DCO and newlog build issue.

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run eden

@eriknordmark eriknordmark mentioned this pull request Apr 26, 2024
@eriknordmark eriknordmark merged commit 445c0ad into lf-edge:master Apr 26, 2024
31 of 36 checks passed
@christoph-zededa
Copy link
Contributor

I agree with you @christoph-zededa but for these PRs all test cases are same as existing test cases. Because nothing changes from EVE API perspective. But when we go to clustering part then for sure we need to get some tests incorporated.

@zedi-pramodh
F.e. if I add the following code:

diff --git a/pkg/pillar/hypervisor/kubevirt.go b/pkg/pillar/hypervisor/kubevirt.go
index 799526e43..fae92297f 100644
--- a/pkg/pillar/hypervisor/kubevirt.go
+++ b/pkg/pillar/hypervisor/kubevirt.go
@@ -91,6 +91,7 @@ var excludedMetrics = map[string]struct{}{
 type kubevirtMetrics map[string]types.DomainMetric
 
 func (metrics *kubevirtMetrics) fill(domainName, metricName string, value interface{}) {
+       panic("This never panics under tests")
        r, ok := (*metrics)[domainName]
        if !ok {
                // Index is not valid

and then I run go test ./... it does not fail; so this means the code is not tested with the same existing test cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants