Add kubevirt hypervisor #3841

zedi-pramodh · 2024-04-03T23:10:18Z

Add Kubevirt hypervisor type in hypervisor.go
Since both kvm and kubevirt use /dev/kvm, we will filter the enabled hypervisor by flag isHVTypeKube
kubevirt.go implements the hypervisor interface for kubevirt hypervisor
Support for VMs, containers as a VM and also launch containers as kubernetes pods if virtualization type NOHYPER is chosen.
hostdevices PCIe passthrough is supported for NVMe disks, USB and Network interfaces.
Add hvTypeKube to domainmgr context
Domainmgr waits until kubernetes is ready for kubevirt type, cleans up any stale VM instances during startup.
Metrics collection uses kubernetes metrics collection mechanism for both VMs and containers (pods), but uses the existing domainmetrics structures to be consistent with other flavors of eve
Added 3 more states in the types.go PENDING, SCHEDULING and FAILED.

NOTE: This PR contains code written by @naiming-zededa too. Especially pod specific and also the metrics collection part.

pkg/pillar/kubeapi/nitoapiserver.go

deitch

Mostly several questions to deepen understanding, a suggestion or two.

pkg/pillar/hypervisor/hypervisor.go

pkg/pillar/hypervisor/nokube.go

pkg/pillar/hypervisor/pci.go

pkg/pillar/kubeapi/kubeapi.go

pkg/pillar/hypervisor/kubevirt.go

pkg/pillar/go.sum

pkg/pillar/cmd/domainmgr/domainmgr.go

pkg/pillar/hypervisor/kubevirt.go

pkg/pillar/kubeapi/kubevirt.go

pkg/pillar/hypervisor/kubevirt.go

christoph-zededa · 2024-04-04T15:02:58Z

pkg/pillar/hypervisor/kubevirt.go

+	var i int
+	var status string
+	var err error
+	for {


What is the reason to check 6 times? It sounds arbitrary.

Moreover why is an endless loop needed?

yes, we should put some comments around, and maybe defined a const for the total wait time on this.

Thx. That sounds good.

@naiming-zededa your response needed here.

@zedi-pramodh can you define a const
const WaitForPodCheckCounter 5 // we check each 15 seconds, don't wait for too long to cause watchdog

and also before the 'for {' add comment:
// wait for pod to be in running state, sometimes can take long, but we only wait for
// about a minute in order not to cause watchdog action

and also in 'if i > 5 {', change to use the const

ack. will be in next commit

Note that we shouldn't have to worry about watchdogs here. See other comments.

pkg/pillar/types/types.go

eriknordmark

I think there is a latent log.Fatal in here, plus a number of other suggestions from the reviewers.

pkg/pillar/hypervisor/kubevirt.go

zedi-pramodh · 2024-04-24T17:09:07Z

I agree with you @christoph-zededa but for these PRs all test cases are same as existing test cases. Because nothing changes from EVE API perspective. But when we go to clustering part then for sure we need to get some tests incorporated.

eriknordmark · 2024-04-24T23:32:23Z

pkg/pillar/hypervisor/hypervisor.go

@@ -43,14 +43,15 @@ type hypervisorDesc struct {
 var knownHypervisors = map[string]hypervisorDesc{
 	XenHypervisorName:        {constructor: newXen, dom0handle: "/proc/xen", hvTypeFileContent: "xen"},
 	KVMHypervisorName:        {constructor: newKvm, dom0handle: "/dev/kvm", hvTypeFileContent: "kvm"},
+	KubevirtHypervisorName:   {constructor: newKubevirt, dom0handle: "/dev/kvm", hvTypeFileContent: "kubevirt"},


It would be a lot easier to read and understand this if we replace the dom0handle string with an enabled func. For the existing one that func would Stat a file and for kubervirt it would be base.IsHVTypeKube()

The functions can even be inline as in
constructor: newXen, enabled: func() bool { _, err := os.Stat("/proc/xen") return err == nil}, hvTypeFileContent: "xen"}

eriknordmark · 2024-04-24T23:33:38Z

pkg/pillar/hypervisor/hypervisor.go

@@ -93,9 +94,18 @@ func BootTimeHypervisor() Hypervisor {
 // the advice of this function and always ask for the enabled one.
 func GetAvailableHypervisors() (all []string, enabled []string) {
 	all = hypervisorPriority
+	isHVTypeKube := base.IsHVTypeKube()
 	for _, v := range all {
 		if _, err := os.Stat(knownHypervisors[v].dom0handle); err == nil {


With the above suggestion this would become
if knownHypervisors[v].enabled() {
enabled = append(enabled, v)
}

I will submit a new PR for these suggestions.

#3892 submitted

pkg/pillar/types/types.go

pkg/pillar/kubeapi/nitoapiserver.go

pkg/pillar/hypervisor/kubevirt.go

zedi-pramodh · 2024-04-24T23:57:04Z

Updated 2 more commits, one for all vendor files coming after bumping the eve-api version and other addressing all review comments until 04/23

eriknordmark

More comments

pkg/pillar/hypervisor/kubevirt.go

eriknordmark · 2024-04-24T23:57:08Z

pkg/pillar/hypervisor/kubevirt.go

+	}
+
+	// Create the VM
+	i := 5


Domainmgr is more patient that that.
It has a separate goroutine for each DomainConfig which handles the create, modify, and delete operations for that task.
Only the main loop in domainmgr has the watchdog hook.

You can see that in the delete/inactivate path where domainmgr can wait for minutes for a VM to halt.

So if there are no reasons for you to not wait forever here for that reason.
HOWEVER, domainmgr has its only logic for retry starting a task if it failed to boot. That uses a configurable timer (maybe 60 seconds by default??).

So question is what type of failures do we see which made you add a retry here, and whether we can simplify this and use the existing retry in domainmgr?

eriknordmark · 2024-04-25T00:05:35Z

pkg/pillar/hypervisor/kubevirt.go

+	// wait for pod to be in running state, sometimes can take long, but we only wait for
+	// about a minute in order not to cause watchdog action


See above. I assuming it is only the create and modify functions in domainmgr which call this, so it can wait forever (but you might want to check for any errors while waiting forever.)

AFAIR, we retry here to make sure kubernetes is up and running after reboot, it could take more than 2 minutes or so. The number of retries from domainmgr may not be sufficient, we do not want to generically bump up retries in domainmgr.

May be we can retry forever as long as error is not kubernetes not ready.

Alternatively if the issue is about k3s not up after a reboot, is there a way we can have e.g., domainmgr wait for it to be fully up and running? Is there a way to check whether it is ready?

If that is the case, then we can return all failures to the caller (and domainmgr will retry the boot after some time using the generic code).

I am sorry it's been longtime since we wrote this code. We are looping for pod to be ready not k3s. For k3s ready we already do that in domainmgr when it starts. We call kubeapi.WaitForKubernetes() that should ensure k3s is ready.

This additional loop check seems to be to make sure when we launched a VM it reached a ready status. It generally takes sometime for kubernetes to schedule the VMI and start it. @naiming-zededa can you respond to this question.

eriknordmark · 2024-04-25T00:17:04Z

Note that the go tests failed with
=== Failed
=== FAIL: hypervisor TestGetAvailableHypervisors (0.00s)
hypervisor_test.go:32: wrong list of available hypervisors: ["xen" "kvm" "kubevirt" "acrn" "containerd" "null"] vs. ["xen" "kvm" "acrn" "containerd" "null"]

zedi-pramodh · 2024-04-25T00:21:20Z

Ah looks like I need to fix that test file to include kubevirt.

eriknordmark

Once the build works we should run this through the tests.
And before merging please squash the commits into a set which makes sense for the future - one commit to update the API and a second one with all of the code might make sense.

zedi-pramodh · 2024-04-25T00:26:16Z

Sure I will squash all commits once I submit one more commit for the latest review comments.

eriknordmark · 2024-04-25T03:39:33Z

Apparently there are also DCO issues - see https://github.com/lf-edge/eve/pull/3841/checks?check_run_id=24229268647

eriknordmark · 2024-04-25T04:46:51Z

The build appears to be failing due to
#0 0.059 /newlog/go.mod:5: unknown directive: toolchain

zedi-pramodh · 2024-04-25T16:14:28Z

Apparently there are also DCO issues - see https://github.com/lf-edge/eve/pull/3841/checks?check_run_id=24229268647

DCO issues seems to be coming from using the option commit the suggestion. I think I will git squash everything to one commit. That should fix it

zedi-pramodh · 2024-04-25T16:15:36Z

The build appears to be failing due to #0 0.059 /newlog/go.mod:5: unknown directive: toolchain

No clue what this one is. May be came after bumping to new eve-api

1) Add Kubevirt hypervisor type in hypervisor.go 2) Since both kvm and kubevirt use /dev/kvm, we will filter the enabled hypervisor by flag isHVTypeKube 3) kubevirt.go implements the hypervisor interface for kubevirt hypervisor 4) Support for VMs, containers as a VM and also launch containers as kubernetes pods if virtualization type NOHYPER is chosen. 5) hostdevices PCIe passthrough is supported for NVMe disks, USB and Network interfaces. 5) Add hvTypeKube to domainmgr context 6) Domainmgr waits until kubernetes is ready for kubevirt type, cleans up any stale VM instances during startup. 7) Metrics collection uses kubernetes metrics collection mechanism for both VMs and containers (pods), but uses the existing domainmetrics structures to be inconsistent with other flavors of eve 8) Added 3 more states in the types.go PENDING, SCHEDULING and FAILED. NOTE: This PR contains code written by Naiming Shen too. Especially pod specific and also the metrics collection part. 9) This commit is squash of all commits after addressing review comments Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>

zedi-pramodh · 2024-04-25T17:08:28Z

git squashed to single commit and fixed DCO and newlog build issue.

eriknordmark

Run eden

christoph-zededa · 2024-04-26T16:42:36Z

I agree with you @christoph-zededa but for these PRs all test cases are same as existing test cases. Because nothing changes from EVE API perspective. But when we go to clustering part then for sure we need to get some tests incorporated.

@zedi-pramodh
F.e. if I add the following code:

diff --git a/pkg/pillar/hypervisor/kubevirt.go b/pkg/pillar/hypervisor/kubevirt.go
index 799526e43..fae92297f 100644
--- a/pkg/pillar/hypervisor/kubevirt.go
+++ b/pkg/pillar/hypervisor/kubevirt.go
@@ -91,6 +91,7 @@ var excludedMetrics = map[string]struct{}{
 type kubevirtMetrics map[string]types.DomainMetric
 
 func (metrics *kubevirtMetrics) fill(domainName, metricName string, value interface{}) {
+       panic("This never panics under tests")
        r, ok := (*metrics)[domainName]
        if !ok {
                // Index is not valid

and then I run go test ./... it does not fail; so this means the code is not tested with the same existing test cases

zedi-pramodh requested review from rene, rouming, milan-zededa and eriknordmark as code owners April 3, 2024 23:10

github-actions bot requested review from jsfakian, OhmSpectator, rucoder, shjala and uncleDecart April 3, 2024 23:11

zedi-pramodh changed the title ~~Add kubevirt hyperisor~~ Add kubevirt hypervisor Apr 3, 2024

zedi-pramodh commented Apr 3, 2024

View reviewed changes

pkg/pillar/kubeapi/nitoapiserver.go Outdated Show resolved Hide resolved

deitch reviewed Apr 4, 2024

View reviewed changes

milan-zededa reviewed Apr 4, 2024

View reviewed changes

pkg/pillar/go.sum Show resolved Hide resolved