-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker provider #1743
Docker provider #1743
Conversation
9bca008
to
46837a3
Compare
46837a3
to
3ab6628
Compare
I had to use Ubuntu binaries and not fedora since I rely on Kind image as the base podvm container image which is using Ubuntu. |
3ab6628
to
8d589ae
Compare
b504032
to
b36329b
Compare
cdde175
to
d2eb5f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First review of this PR about the copyright and some strings.
Will follow the document to play this new provider in my dev machine later.
@bpradipt this is very cool, thanks for adding this new provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bpradipt brings the great feature, left several comments...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few in progress comments from my review
d2eb5f7
to
e5d2507
Compare
e5d2507
to
f2527ae
Compare
The CI failures are unrelated and looks like a network issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments
I've tried re-starting the documented process from scratch and it's now failing with:
Looking into the kata-agent log (thanks docker exec 😄 ) I see
So I think your rebase (and hence the main code?) might be broken? This is the same error that Zvonko reported, but I thought we were creating the |
f2527ae
to
595688c
Compare
I do see the cdh warning on my setup, but container starts
Try with AA_KBC_PARAMS in peer-pods-cm to create the cdh.toml. The main error seems to this one |
Apologies for my ignorance (and if I've missed the doc), but where exactly should I put it. I tried in:
and deleted the CAA pod to trigger a restart, but then hit a docker entrypoint error:
so it seemed to pass it through correctly, but I guess the quotes screwed it up? |
Ah .. this looks like a code bug.. I have pushed a fix.. Please try with a new CAA image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! @bpradipt
I retried and when I added AA_KBC_PARAMS to the peer-pods-cm it got rid of the cdh error, but the underlying problem still remains:
At one point this was caused by the aa-offline-fs files not being there, but I'v checked and they are:
Unfortunately image-rs doesn't do logging, so it's pretty tricky to work out what has gone wrong, but I don't understand how everyone else has got this working and which step in the instructions I've got wrong |
Can you try once with this image - quay.io/confidential-containers/podvm-docker-image ? |
Same issue:
Just to check - this is the podvm image I have just pulled:
It is super late for you now, so we can debug it tomorrow my morning if you have time? |
I built out caa from this PR and deploy the docker provider succeeded, but I failed to got a running nginx pod. docker version
Client: Docker Engine - Community
Version: 26.1.0
API version: 1.45
Go version: go1.21.9
Git commit: 9714adc
Built: Mon Apr 22 17:06:41 2024
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 26.1.0
API version: 1.45 (minimum version 1.24)
Go version: go1.21.9
Git commit: c8af8eb
Built: Mon Apr 22 17:06:41 2024
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.31
GitCommit: e377cd56a71523140ca6ae87e30244719194a521
runc:
Version: 1.1.12
GitCommit: v1.1.12-0-g51d5e94
docker-init:
Version: 0.19.0
GitCommit: de40ad0
systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2024-04-25 02:54:24 UTC; 47min ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 964 (dockerd)
Tasks: 21
Memory: 130.1M
CPU: 1.188s
CGroup: /system.slice/docker.service
└─964 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
Apr 25 02:54:22 liudali-x86-build dockerd[964]: time="2024-04-25T02:54:22.881054344Z" level=info msg="Starting up"
Apr 25 02:54:22 liudali-x86-build dockerd[964]: time="2024-04-25T02:54:22.897835741Z" level=info msg="detected 127.0.0.53 nameserver, assuming systemd-resolved, so using resolv.conf: /run/systemd/resolve/resolv.conf"
Apr 25 02:54:23 liudali-x86-build dockerd[964]: time="2024-04-25T02:54:23.200749279Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
Apr 25 02:54:23 liudali-x86-build dockerd[964]: time="2024-04-25T02:54:23.680000559Z" level=info msg="Loading containers: start."
Apr 25 02:54:23 liudali-x86-build dockerd[964]: time="2024-04-25T02:54:23.987127523Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Apr 25 02:54:24 liudali-x86-build dockerd[964]: time="2024-04-25T02:54:24.020446148Z" level=info msg="Loading containers: done."
Apr 25 02:54:24 liudali-x86-build dockerd[964]: time="2024-04-25T02:54:24.059005316Z" level=info msg="Docker daemon" commit=c8af8eb containerd-snapshotter=false storage-driver=overlay2 version=26.1.0
Apr 25 02:54:24 liudali-x86-build dockerd[964]: time="2024-04-25T02:54:24.059409939Z" level=info msg="Daemon has completed initialization"
Apr 25 02:54:24 liudali-x86-build systemd[1]: Started Docker Application Container Engine.
Apr 25 02:54:24 liudali-x86-build dockerd[964]: time="2024-04-25T02:54:24.735013181Z" level=info msg="API listen on /run/docker.sock"
systemctl status docker.socket
● docker.socket - Docker Socket for the API
Loaded: loaded (/lib/systemd/system/docker.socket; enabled; vendor preset: enabled)
Active: active (running) since Thu 2024-04-25 02:54:19 UTC; 50min ago
Triggers: ● docker.service
Listen: /run/docker.sock (Stream)
Tasks: 0 (limit: 38487)
Memory: 0B
CPU: 935us
CGroup: /system.slice/docker.socket
Apr 25 02:54:19 liudali-x86-build systemd[1]: Starting Docker Socket for the API...
Apr 25 02:54:19 liudali-x86-build systemd[1]: Listening on Docker Socket for the API.
I installed docker by follow this page https://docs.docker.com/engine/install/ubuntu/ |
If you are executing as non-root user (eg. ubuntu), can you check if you are able to run docker commands (eg docker info, docker ps etc)? Otherwise you'll need to first run the postinstall step for docker and then retry. Also please check if the docker socket is mounted inside the CAA pod |
The same image works on my env. Let me create a fresh env and check. |
So I'm not sure what the difference is, but I also created a completely fresh environment and it seems to be working now:
For reference my full history on this box is:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bpradipt @stevenhorsman I figure out the root cause why my dev machine not work as expected.
I created the cluster by using ./libvirt/kcli_cluster.sh create
but I setup the docker on the my dev machine directly. The docker env need be installed in the worker node peer-pods-worker-0
.
After I install the docker in worker node and pull the image inside it. I meet the same error as @stevenhorsman report before.
2024/04/25 14:26:56 [podnetwork] routes on netns /var/run/netns/cni-fe8183e1-ec9f-325a-62d7-bd5e1248ee13
2024/04/25 14:26:56 [podnetwork] 0.0.0.0/0 via 10.244.1.1 dev eth0
2024/04/25 14:26:56 [podnetwork] 10.244.0.0/16 via 10.244.1.1 dev eth0
2024/04/25 14:26:56 [adaptor/cloud] Credentials file is not in a valid Json format, ignored
2024/04/25 14:26:56 [adaptor/cloud] stored /run/peerpod/pods/d4835bc53007be1d9115e73708a7eb1f277657458db6b457fc6fb6881b77ead1/daemon.json
2024/04/25 14:26:56 [adaptor/cloud] create a sandbox d4835bc53007be1d9115e73708a7eb1f277657458db6b457fc6fb6881b77ead1 for pod nginx-5bb58f7796-7blpm in namespace default (netns: /var/run/netns/cni-fe8183e1-ec9f-325a-62d7-bd5e1248ee13)
2024/04/25 14:26:56 [adaptor/cloud/docker] CreateInstance: name: "podvm-nginx-5bb58f7796-7blpm-d4835bc5"
2024/04/25 14:26:56 [adaptor/cloud/docker] CreateInstance: instanceID: "55e08ff5777096488a5abd3be50d6c5c9693dcae0abd8080f321b2dab5490764", ip: "172.17.0.3"
2024/04/25 14:26:56 [util/k8sops] nginx-5bb58f7796-7blpm is now owning a PeerPod object
2024/04/25 14:26:56 [adaptor/cloud] created an instance podvm-nginx-5bb58f7796-7blpm-d4835bc5 for sandbox d4835bc53007be1d9115e73708a7eb1f277657458db6b457fc6fb6881b77ead1
2024/04/25 14:26:56 [tunneler/vxlan] vxlan ppvxlan1 (remote 172.17.0.3:4789, id: 555002) created at /proc/1/task/12/ns/net
2024/04/25 14:26:56 [tunneler/vxlan] vxlan ppvxlan1 created at /proc/1/task/12/ns/net
2024/04/25 14:26:56 [tunneler/vxlan] vxlan ppvxlan1 is moved to /var/run/netns/cni-fe8183e1-ec9f-325a-62d7-bd5e1248ee13
2024/04/25 14:26:56 [tunneler/vxlan] Add tc redirect filters between eth0 and vxlan1 on pod network namespace /var/run/netns/cni-fe8183e1-ec9f-325a-62d7-bd5e1248ee13
2024/04/25 14:26:56 [adaptor/proxy] Listening on /run/peerpod/pods/d4835bc53007be1d9115e73708a7eb1f277657458db6b457fc6fb6881b77ead1/agent.ttrpc
2024/04/25 14:26:56 [adaptor/proxy] failed to init cri client, the err: cri runtime endpoint is not specified, it is used to get the image name from image digest
2024/04/25 14:26:56 [adaptor/proxy] Trying to establish agent proxy connection to 172.17.0.3:15150
2024/04/25 14:26:58 [adaptor/proxy] established agent proxy connection to 172.17.0.3:15150
2024/04/25 14:26:58 [adaptor/cloud] agent proxy is ready
2024/04/25 14:26:58 [adaptor/proxy] CreateSandbox: hostname:nginx-5bb58f7796-7blpm sandboxId:d4835bc53007be1d9115e73708a7eb1f277657458db6b457fc6fb6881b77ead1
2024/04/25 14:26:58 [adaptor/proxy] storages:
2024/04/25 14:26:58 [adaptor/proxy] mountpoint:/run/kata-containers/sandbox/shm source:shm fstype:tmpfs driver:ephemeral
2024/04/25 14:27:01 [adaptor/proxy] CreateContainer: containerID:d4835bc53007be1d9115e73708a7eb1f277657458db6b457fc6fb6881b77ead1
2024/04/25 14:27:01 [adaptor/proxy] mounts:
2024/04/25 14:27:01 [adaptor/proxy] destination:/proc source:proc type:proc
2024/04/25 14:27:01 [adaptor/proxy] destination:/dev source:tmpfs type:tmpfs
2024/04/25 14:27:01 [adaptor/proxy] destination:/dev/pts source:devpts type:devpts
2024/04/25 14:27:01 [adaptor/proxy] destination:/dev/shm source:/run/kata-containers/sandbox/shm type:bind
2024/04/25 14:27:01 [adaptor/proxy] destination:/dev/mqueue source:mqueue type:mqueue
2024/04/25 14:27:01 [adaptor/proxy] destination:/sys source:sysfs type:sysfs
2024/04/25 14:27:01 [adaptor/proxy] destination:/dev/shm source:/run/kata-containers/sandbox/shm type:bind
2024/04/25 14:27:01 [adaptor/proxy] destination:/etc/resolv.conf source:/run/kata-containers/shared/containers/d4835bc53007be1d9115e73708a7eb1f277657458db6b457fc6fb6881b77ead1-1303004197fc5a0c-resolv.conf type:bind
2024/04/25 14:27:01 [adaptor/proxy] annotations:
2024/04/25 14:27:01 [adaptor/proxy] io.kubernetes.cri.sandbox-name: nginx-5bb58f7796-7blpm
2024/04/25 14:27:01 [adaptor/proxy] io.kubernetes.cri.sandbox-namespace: default
2024/04/25 14:27:01 [adaptor/proxy] io.kubernetes.cri.sandbox-cpu-quota: 0
2024/04/25 14:27:01 [adaptor/proxy] io.kubernetes.cri.sandbox-id: d4835bc53007be1d9115e73708a7eb1f277657458db6b457fc6fb6881b77ead1
2024/04/25 14:27:01 [adaptor/proxy] io.kubernetes.cri.sandbox-cpu-shares: 2
2024/04/25 14:27:01 [adaptor/proxy] io.kubernetes.cri.container-type: sandbox
2024/04/25 14:27:01 [adaptor/proxy] io.katacontainers.pkg.oci.container_type: pod_sandbox
2024/04/25 14:27:01 [adaptor/proxy] io.katacontainers.pkg.oci.bundle_path: /run/containerd/io.containerd.runtime.v2.task/k8s.io/d4835bc53007be1d9115e73708a7eb1f277657458db6b457fc6fb6881b77ead1
2024/04/25 14:27:01 [adaptor/proxy] io.kubernetes.cri.sandbox-cpu-period: 100000
2024/04/25 14:27:01 [adaptor/proxy] io.kubernetes.cri.sandbox-memory: 0
2024/04/25 14:27:01 [adaptor/proxy] io.kubernetes.cri.sandbox-log-directory: /var/log/pods/default_nginx-5bb58f7796-7blpm_c9855d8f-b89d-4efd-a7e3-b7bd2b323d5e
2024/04/25 14:27:01 [adaptor/proxy] nerdctl/network-namespace: /var/run/netns/cni-fe8183e1-ec9f-325a-62d7-bd5e1248ee13
2024/04/25 14:27:01 [adaptor/proxy] io.kubernetes.cri.sandbox-uid: c9855d8f-b89d-4efd-a7e3-b7bd2b323d5e
2024/04/25 14:27:01 [adaptor/proxy] getImageName: no pause image specified uses default pause image: registry.k8s.io/pause:3.7
2024/04/25 14:27:01 [adaptor/proxy] CreateContainer: calling PullImage for "registry.k8s.io/pause:3.7" before CreateContainer (cid: "d4835bc53007be1d9115e73708a7eb1f277657458db6b457fc6fb6881b77ead1")
2024/04/25 14:27:02 [adaptor/proxy] CreateContainer: successfully pulled image "registry.k8s.io/pause:3.7"
2024/04/25 14:27:02 [adaptor/proxy] CreateContainer fails: rpc error: code = Internal desc = Establishing a D-Bus connection
Caused by:
0: I/O error: No such file or directory (os error 2)
1: No such file or directory (os error 2)
2024/04/25 14:27:02 [adaptor/proxy] DestroySandbox
2024/04/25 14:27:02 [adaptor/proxy] shutting down socket forwarder
2024/04/25 14:27:02 [adaptor/cloud/docker] DeleteInstance: instanceID: "55e08ff5777096488a5abd3be50d6c5c9693dcae0abd8080f321b2dab5490764"
2024/04/25 14:27:02 [util/k8sops] nginx-5bb58f7796-7blpm's owned PeerPod object can now be deleted
I'm suspecting it could be something to do with cgroup entries. We have seen similar errors earlier with kata-agent. Debugging is hard |
Add initial support to run peer-pods in a docker container. We create a container image with all the necessary components required to act as pod VM. Currently we rely on K8s Kind image as the base image to act as pod VM Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Some ARGS were missing that were present in Fedora and RHEL Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Add docker provider in entrypoint.sh Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Allow installation via kustomize files Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Run go mod tidy and update for all sub projects Also update the base golang version to go1.21 for the sub projects Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
595688c
to
68f8035
Compare
@stevenhorsman @liudalibj I have addressed all your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good enough to merge in. There are some question marks about failures we've seen, but we don't have an easy debug path at the moment, and have got things working, so I think it's enough to get merged as it's primarily a developer option. I guess going forward it might be good to seem some e2e tests to help ensure stability. Thanks for the idea and execution @bpradipt
For quick testing:
Once CAA is deployed, change the image using the following command
Download the pod VM container image
Create a same pod with runtimeClass
kata-remote