Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud-Api-Adaptor: PP Secure Comms #1776

Merged
merged 1 commit into from
May 7, 2024

Conversation

davidhadas
Copy link
Member

Resolves: #1770

Secure Comms feature to secure communication between the cluster and the Peer Pods. Rely on KBS keys to establish an ssh channel between the cluster WN and the PP. The ssh channel can then be used to tunnel different communications to/from the PP.

@davidhadas davidhadas changed the title PP Secure Comms Cloud-Api-Adaptor: PP Secure Comms Apr 7, 2024
@davidhadas davidhadas force-pushed the PpSecureComms branch 2 times, most recently from b57eab5 to 0cda75a Compare April 9, 2024 09:06
@davidhadas davidhadas changed the title Cloud-Api-Adaptor: PP Secure Comms Cloud-Api-Adaptor: PP Secure Comms (WIP) Apr 9, 2024
@davidhadas davidhadas changed the title Cloud-Api-Adaptor: PP Secure Comms (WIP) Cloud-Api-Adaptor: PP Secure Comms Apr 9, 2024
@davidhadas davidhadas force-pushed the PpSecureComms branch 2 times, most recently from f5c47fb to 478d901 Compare April 9, 2024 20:51
@davidhadas davidhadas force-pushed the PpSecureComms branch 2 times, most recently from 8786e5d to c4c6223 Compare April 10, 2024 13:40
@davidhadas
Copy link
Member Author

I added standalone testing - you may try out the Secure Comms feature outside of a CoCo environment using this standalone test.
See ./src/cloud-api-adaptor/docs/SecureComms under ##Testing

@davidhadas davidhadas force-pushed the PpSecureComms branch 6 times, most recently from 0862ea0 to 322be5c Compare April 15, 2024 10:13
Copy link
Contributor

@snir911 snir911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! added some minor comments
Would be helpful to add also instructions/guidance to the docs as part of this patch series

src/cloud-api-adaptor/pkg/securecomms/ppssh/ppsecrets.go Outdated Show resolved Hide resolved
src/cloud-api-adaptor/install/yamls/caa-pod.yaml.stash Outdated Show resolved Hide resolved
- apiGroups: [""]
resources: ["secrets"]
verbs: ["create", "patch", "update", "get", "watch", "list"]
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it is needed? can access be limited to specific secret?
resourceNames: ["my-configmap"]

Copy link
Member Author

@davidhadas davidhadas Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is creating secrets named pp-<peer pod SID> under the namespace. These secrets include a key pair for each PP. This will be used to regain PP keys following CAA Pod restart.

@davidhadas
Copy link
Member Author

Thanks! added some minor comments Would be helpful to add also instructions/guidance to the docs as part of this patch series

See src/cloud-api-adaptor/docs/SecureComms - Is this enough?

Copy link
Member

@c3d c3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in cover letter, "communciation" instead of "communication"
image

In cover letter of e218604

@davidhadas
Copy link
Member Author

davidhadas commented Apr 15, 2024

Typo in cover letter, "communciation" instead of "communication"

Resolved

@bpradipt
Copy link
Member

@davidhadas overall the code looks good to me. I only had some comments w.r.to better readability.
I'm going to test this now.

@bpradipt
Copy link
Member

I tested this on my setup using the docker provider.
I get errors from the AA

[2024-04-26T07:30:41Z ERROR attestation_agent::rpc::attestation::ttrpc] Call AA-KBC to get token failed: Unsupported token type: Matching variant not found
2024/04/26 07:30:41 [secure-comms] getKey sshclient/publicKey statusCode 200 success
2024/04/26 07:30:41 [secure-comms] PpSecrets sshclient/publicKey success
2024/04/26 07:30:41 [secure-comms] PpSecrets obtaining key pp-sid/privateKey
root_path /cdh, url_path /resource/default/pp-sid/privateKey
[2024-04-26T07:30:41Z INFO  confidential_data_hub::hub] get resource called: kbs:///default/pp-sid/privateKey
[2024-04-26T07:30:41Z ERROR attestation_agent::rpc::attestation::ttrpc] Call AA-KBC to get token failed: Unsupported token type: Matching variant not found
2024/04/26 07:30:41 [secure-comms] getKey pp-sid/privateKey statusCode 200 success
2024/04/26 07:30:41 [secure-comms] PpSecrets pp-sid/privateKey success
2024/04/26 07:30:41 [secure-comms] Attestation phase: failed getting keys from KBS: unable to parse public key: ssh: no key found
2024/04/26 07:30:41 [secure-comms] Attestation phase: getting keys from KBS
2024/04/26 07:30:41 [secure-comms] PpSecrets obtaining key sshclient/publicKey

I would have expected the sample attester to work. Need to check my AA build.

@bpradipt
Copy link
Member

Overall LGTM. I just reviewed the source code, and haven't tested this PR on an actual environment. Does anyone has a chance to try this PR?

One thing I am not sure is how to access Kubernetes Secret. SSH keys are stored in a Secret object, and CAA retrieves it by contacting an Kuberentes API server. I think peerpod-ctrl/peerpodconfig-ctrl also uses CRD object to manage CAA configuration. Do we need to use such an existing controller instead of CAA to manage SSH keys?

I'd like to hear comments from others.

@bpradipt @snir911 @huoqifeng @stevenhorsman

@yoheiueda for secure-comms CAA is creating a key pair for the PP, using peer-pod controller will not be feasible as I see it.

@bpradipt
Copy link
Member

I'm able to run it successfully using docker provider. My issue was KBS and I had to use the following images

kubectl set image -n kbs-operator-system deployment/kbs-deployment kbs=ghcr.io/confidential-containers/staged-images/kbs-grpc-as:latest as=ghcr.io/confidential-containers/staged-images/coco-as-grpc:latest rvps=ghcr.io/confidential-containers/staged-images/rvps:latest

Here are the relevant cloud-api-adaptor logs:

2024/04/28 11:55:55 [adaptor/cloud/docker] CreateInstance: name: "podvm-nginx-bd8697c48-fcs4q-0716d5c4"
2024/04/28 11:55:56 [adaptor/cloud/docker] CreateInstance: instanceID: "8f667729ab77331ef0c6b157e4c9efc561a3c060881998447496663a6778d743", ip: "172.17.0.2"
2024/04/28 11:55:56 [adaptor/cloud] failed to create PeerPod: create not allowed while custom resource definition is terminating
2024/04/28 11:55:56 [adaptor/cloud] created an instance podvm-nginx-bd8697c48-fcs4q-0716d5c4 for sandbox 0716d5c47d4ec49fd4c370cff10860fad1f8f2d3e55962f1e4f74ee487e59feb
2024/04/28 11:55:56 [secure-comms] InitPP read/create PP secret named: pp-0716d5c47d4ec49fd4c370cff10860fad1f8f2d3e55962f1e4f74ee487e59feb
2024/04/28 11:55:58 [secure-comms] CreateSecret 'pp-0716d5c47d4ec49fd4c370cff10860fad1f8f2d3e55962f1e4f74ee487e59feb'
2024/04/28 11:55:58 [secure-comms] Updating KBS with secret for: default/pp-0716d5c47d4ec49fd4c370cff10860fad1f8f2d3e55962f1e4f74ee487e59feb/privateKey
2024/04/28 11:55:58 [secure-comms] Inbound listening to port 36569
2024/04/28 11:55:58 [secure-comms] Attestation phase: starting
2024/04/28 11:55:58 [secure-comms] Attestation phase: ssh connected - 172.17.0.2:2222
2024/04/28 11:55:58 [secure-comms] Attestation phase: ssh skipped validating server's host key (type ssh-rsa) during attestation
2024/04/28 11:55:58 [secure-comms] Attestation phase: peer reported phase Attestation
2024/04/28 11:55:59 [secure-comms] Attestation phase: NewSshPeer - peer requested a tunnel channel for KBS
2024/04/28 11:55:59 [secure-comms] Outbound KBS acceptProxy setting up for sid 0716d5c47d4ec49fd4c370cff10860fad1f8f2d3e55962f1e4f74ee487e59feb
2024/04/28 11:55:59 [secure-comms] Outbound KBS acceptProxy modified URL to /kbs/v0/auth of host 10.104.217.75:8080
2024/04/28 11:55:59 [secure-comms] Outbound KBS acceptProxy to /kbs/v0/auth status code 200
2024/04/28 11:55:59 [secure-comms] Outbound KBS acceptProxy modified URL to /kbs/v0/attest of host 10.104.217.75:8080
2024/04/28 11:55:59 [secure-comms] Outbound KBS acceptProxy to /kbs/v0/attest status code 200
2024/04/28 11:55:59 [secure-comms] Outbound KBS acceptProxy recovered: runtime error: invalid memory address or nil pointer dereference
2024/04/28 11:55:59 [secure-comms] Attestation phase: NewSshPeer - peer requested a tunnel channel for KBS
2024/04/28 11:55:59 [secure-comms] Outbound KBS acceptProxy setting up for sid 0716d5c47d4ec49fd4c370cff10860fad1f8f2d3e55962f1e4f74ee487e59feb
2024/04/28 11:55:59 [secure-comms] Outbound KBS acceptProxy modified URL to /kbs/v0/resource/default/sshclient/publicKey of host 10.104.217.75:8080
2024/04/28 11:55:59 [secure-comms] Outbound KBS acceptProxy to /kbs/v0/resource/default/sshclient/publicKey status code 200
2024/04/28 11:55:59 [secure-comms] Outbound KBS acceptProxy modified URL to /kbs/v0/resource/default/pp-0716d5c47d4ec49fd4c370cff10860fad1f8f2d3e55962f1e4f74ee487e59feb/privateKey of host 10.104.217.75:8080
2024/04/28 11:55:59 [secure-comms] Outbound KBS acceptProxy to /kbs/v0/resource/default/pp-0716d5c47d4ec49fd4c370cff10860fad1f8f2d3e55962f1e4f74ee487e59feb/privateKey status code 200
2024/04/28 11:55:59 [secure-comms] Attestation phase: peer reported it is upgrading to Kubernetes phase
2024/04/28 11:55:59 [secure-comms] Attestation phase: peer done by >>> chans closed <<<
2024/04/28 11:55:59 [secure-comms] Outbound KBS acceptProxy recovered: runtime error: invalid memory address or nil pointer dereference
2024/04/28 11:55:59 [secure-comms] Attestation phase: done
2024/04/28 11:55:59 [secure-comms] Kubernetes phase: starting (number of restarts 0)
2024/04/28 11:55:59 [secure-comms] Kubernetes phase: ssh connected - 172.17.0.2:2222
2024/04/28 11:55:59 [tunneler/vxlan] vxlan ppvxlan1 (remote 172.17.0.2:4789, id: 555000) created at /proc/1/task/26/ns/net
2024/04/28 11:55:59 [tunneler/vxlan] vxlan ppvxlan1 created at /proc/1/task/26/ns/net
2024/04/28 11:55:59 [secure-comms] Kubernetes phase: ssh host key match - ssh-rsa
2024/04/28 11:55:59 [secure-comms] Kubernetes phase: peer reported phase Kubernetes
2024/04/28 11:55:59 [secure-comms] Kubernetes phase: AddInbound: KATAAPI
2024/04/28 11:55:59 [tunneler/vxlan] vxlan ppvxlan1 is moved to /var/run/netns/cni-f2c5ceca-3494-4ed8-a06d-c4e3c7a091fe
2024/04/28 11:55:59 [tunneler/vxlan] Add tc redirect filters between eth0 and vxlan1 on pod network namespace /var/run/netns/cni-f2c5ceca-3494-4ed8-a06d-c4e3c7a091fe
2024/04/28 11:55:59 [adaptor/proxy] Listening on /run/peerpod/pods/0716d5c47d4ec49fd4c370cff10860fad1f8f2d3e55962f1e4f74ee487e59feb/agent.ttrpc
2024/04/28 11:55:59 [adaptor/proxy] failed to init cri client, the err: cri runtime endpoint is not specified, it is used to get the image name from image digest
2024/04/28 11:55:59 [adaptor/proxy] Trying to establish agent proxy connection to 127.0.0.1:36569
2024/04/28 11:55:59 [adaptor/proxy] established agent proxy connection to 127.0.0.1:36569
2024/04/28 11:55:59 [secure-comms] Kubernetes phase: Inbound accept: KATAAPI
2024/04/28 11:55:59 [adaptor/cloud] agent proxy is ready
2024/04/28 11:55:59 [secure-comms] Kubernetes phase: NewInboundInstance OpenChannel opening tunnel for: KATAAPI
2024/04/28 11:55:59 [adaptor/proxy] CreateSandbox: hostname:nginx-bd8697c48-fcs4q sandboxId:0716d5c47d4ec49fd4c370cff10860fad1f8f2d3e55962f1e4f74ee487e59feb
2024/04/28 11:55:59 [adaptor/proxy]     storages:
2024/04/28 11:55:59 [adaptor/proxy]         mountpoint:/run/kata-containers/sandbox/shm source:shm fstype:tmpfs driver:ephemeral
2024/04/28 11:55:59 [adaptor/proxy] CreateContainer: containerID:0716d5c47d4ec49fd4c370cff10860fad1f8f2d3e55962f1e4f74ee487e59feb
...
2024/04/28 11:55:59 [adaptor/proxy] CreateContainer: calling PullImage for "quay.io/jitesoft/nginx:latest" before CreateContainer (cid: "143932dadb686177d11ddecd5a7d8a494ea6128e3fc142e95b37b755524dbc01")
2024/04/28 11:55:59 [secure-comms] Outbound KBS acceptProxy modified URL to /kbs/v0/resource/default/credential/test of host 10.104.217.75:8080
2024/04/28 11:55:59 [secure-comms] Outbound KBS acceptProxy to /kbs/v0/resource/default/credential/test status code 404
2024/04/28 11:56:00 [adaptor/proxy] CreateContainer: successfully pulled image "quay.io/jitesoft/nginx:latest"
2024/04/28 11:56:00 [adaptor/proxy] StartContainer: containerID:143932dadb686177d11ddecd5a7d8a494ea6128e3fc142e95b37b755524dbc01

Here are the entries in kbs

# ls -l /opt/confidential-containers/kbs/repository/default/
drwxr-xr-x. 2 root root 23 Apr 28 11:55 sshclient
drwxr-xr-x. 2 root root 24 Apr 28 11:55 pp-0716d5c47d4ec49fd4c370cff10860fad1f8f2d3e55962f1e4f74ee487e59feb

@davidhadas davidhadas force-pushed the PpSecureComms branch 3 times, most recently from 0affa49 to d09c78f Compare April 30, 2024 18:53
Copy link
Member

@bpradipt bpradipt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
Thanks @davidhadas for your patience and addressing the review comments.

src/cloud-api-adaptor/docs/SecureComms.md Show resolved Hide resolved
src/cloud-api-adaptor/docs/SecureComms.md Outdated Show resolved Hide resolved

## Testing

Testing securecomms as a standalone can be done by using:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note that with the cluster, kbs and secure comms set up I got this working:

# go run ./test/securecomms/double/main.go
2024/05/02 12:35:54 [secure-comms] Using PP SecureComms: InitSshServer version v0.2
2024/05/02 12:35:54 [secure-comms] Inbound listening to port 7000
2024/05/02 12:35:54 [secure-comms] Inbound listening to port 16443
2024/05/02 12:35:54 [secure-comms] Inbound listening to port 9053
...
2024/05/02 12:36:02 [secure-comms] Attestation phase: starting
2024/05/02 12:36:02 [secure-comms] Attestation phase: client connected
2024/05/02 12:36:02 [secure-comms] Attestation phase: connected
2024/05/02 12:36:02 [secure-comms] Attestation phase: ssh connected - 127.0.0.1:2222
2024/05/02 12:36:03 [secure-comms] Attestation phase: SSH server initialized keys
2024/05/02 12:36:03 [secure-comms] Attestation phase: SSH server initialized with NoClientAuth
2024/05/02 12:36:03 [secure-comms] Attestation phase: ssh skipped validating server's host key (type ssh-rsa) during attestation
2024/05/02 12:36:03 [secure-comms] Attestation phase: logged-in without key
2024/05/02 12:36:03 [secure-comms] Attestation phase: peer reported phase Attestation
2024/05/02 12:36:03 [secure-comms] Attestation phase: peer reported phase Attestation
...
2024/05/02 12:36:03 [secure-comms] Attestation phase: done
HttpClient start : http://127.0.0.1:45561
HttpClient sending req: http://127.0.0.1:45561
2024/05/02 12:36:03 [secure-comms] Kubernetes phase: starting (number of restarts 0)
2024/05/02 12:36:03 [secure-comms] Kubernetes phase: ssh connected - 127.0.0.1:2222
2024/05/02 12:36:03 [secure-comms] Kubernetes client connected
2024/05/02 12:36:03 [secure-comms] Kubernetes phase: connected
...
2024/05/02 12:36:04 [secure-comms] Kubernetes phase: done
2024/05/02 12:36:04 [secure-comms] SshClientInstance DisconnectPP success
2024/05/02 12:36:04 [secure-comms] DeleteSecret 'pp-sid'
*** SUCCESS ***

Maybe not what I'd describe as standalone as so, but it's good to get it working :)


## Future Plans

- Add DeleteResource() support in KBS, KBC, api-server-rest, than cleanup resources added by Secure Comms to KBS whenever a Peer Pod fail to be created or when a Peer Pod is terminated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see Pradipta's suggestion of moving the agent-protocol-forwarder config to a file and provide it via user-data which process-user-data can read and create the config, such that we don't need a separate podvm build to use this feature, on the future plans list as that would help enable much easier testing & release of this feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this doc is specific for the secure comms and using a config file for agent-protocol-forwarder is not really specific to secure comms, my suggestion will be to create a tracker issue in this repo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for me too

Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of code here and the networking is not much speciality, so I've not done the most in-depth code review, but I've tried following the docs and got the tests passing and given that this is an optional feature that is off by default, and this is the first step on the journey, then I don't see an issue with merging it personally. I would like to hear from some of the other reviewers who gave comments and offer more value than I do! Thanks for the updates and patience @davidhadas

Secure Comms feature to secure communication between the cluster and the
Peer Pods. Rely on KBS keys to establish an ssh channel between the
cluster WN and the PP. The ssh channel can then be used to tunnel
different communications to/from the PP.

Use peer-pods-cm to initialize adaptor's secure-comms
Use agent-protocol-forwarder.service to initialize forwarder's secure-comms

Includes testing for Secure-Comms running as a stand-alone
Includes unit testing for Secure-Comms

Signed-off-by: David Hadas <david.hadas@gmail.com>
@bpradipt
Copy link
Member

bpradipt commented May 3, 2024

One of the libvirt e2e is failing.
@stevenhorsman @davidhadas any ideas ?

@davidhadas
Copy link
Member Author

davidhadas commented May 3, 2024

One of the libvirt e2e is failing. @stevenhorsman @davidhadas any ideas ?

Does not seem related - tests passed yesterday afaik and the only changed file is a readme file.
I suggest to rerun the test.

I also suggest that PR owners will be granted some more privileges such as enabling them to re-run tests :)

@stevenhorsman
Copy link
Member

The TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment test occasionally fails on the nightly tests too. I think this is the first time I've seen it on PR tests, but I've re-run and I don't think it's related to the changes here.

@davidhadas
Copy link
Member Author

The TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment test occasionally fails on the nightly tests too. I think this is the first time I've seen it on PR tests, but I've re-run and I don't think it's related to the changes here.

TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment Failed for a second time

@bpradipt
Copy link
Member

bpradipt commented May 7, 2024

Since the e2e finally passed, I'm merging this.

@bpradipt bpradipt merged commit 1345716 into confidential-containers:main May 7, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test_e2e_libvirt Run Libvirt e2e tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PeerPods Secure Comms
8 participants