Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Production runtime should use a VM for isolation/security #742

Open
alecthomas opened this issue Jan 6, 2024 · 9 comments
Open

Production runtime should use a VM for isolation/security #742

alecthomas opened this issue Jan 6, 2024 · 9 comments
Assignees

Comments

@alecthomas
Copy link
Collaborator

alecthomas commented Jan 6, 2024

Runners currently execute user code directly on the same host that they run on. In k8s this is not terrible, but ideally FTL would execute user code inside a VM to completely isolate it. This would also allow us to restrict inbound/outbound network, and so on.

Useful references:

@alecthomas alecthomas changed the title Production runtime should use Firecracker for isolation/security Production runtime should use a VM for isolation/security Jan 6, 2024
@alecthomas alecthomas mentioned this issue Feb 7, 2024
@stuartwdouglas
Copy link
Contributor

I don't think this is relevant now we have moved to Kube as the primary target.

@alecthomas
Copy link
Collaborator Author

alecthomas commented Sep 17, 2024

It is still relevant because the runner and the user code are seperate security domains, and the runner is a policy enforcement point. The user code should not have access to anything but the runner "proxy" ports.

@stuartwdouglas
Copy link
Contributor

In that case from a kube PoV you probably want the enforcement part of the runner in one container, and the actual user runtime in a different container inside the pod. AFAIK you can't really do this sort of isolation if they are all inside the same container.

@alecthomas
Copy link
Collaborator Author

That could be a good first step, and maybe sufficient long term, but we should run that past @AlexSzlavik. From everything I've read, containers are not a reliable security boundary. But perhaps now that we're all-in on Kubernetes we can combine this with other Kubernetes security approaches, like routing policies, and coupled with our own policy enforcement that might be fine.

@alecthomas
Copy link
Collaborator Author

Chatted to Alex about this earlier today and we think the Runner could be a sidecar, with the user code proxying everything through it. Presumably the user container can be locked down such that it can't access anything except for the Runner.

One issue is that because we currently route everything through the Controller, the Runner needs to be able to differentiate between traffic originating from the user module and all other traffic in order to avoid routing loops. There's code in place to do that, but it's likely bitrotted, and also requires changes to each runtime, so the JVM runtime probably doesn't support this currently. Needs testing.

@alecthomas
Copy link
Collaborator Author

One other thing that occurred to me is that we'll need to split the "runner" into two containers - the ftl-runner itself and the image that user code runs on (ie. what is currently the ftl-runner image).

@stuartwdouglas stuartwdouglas self-assigned this Sep 18, 2024
@AlexSzlavik
Copy link
Contributor

I feel like, running VMs in place of containers directly would be a challenge in K8s. Some googling around seemed to indicate that it's doable, but would definitely be a specialized deployment strategy. I'd be concerned that this approach would get in the way of adoption of FTL.

If we aren't doing this, a sidecar model (a la envoy) makes sense to me. I guess this means that we'd have to split the runner image into 2 right? The "edge" runner sidecar and the main "workload" runner. The former is responsible for interfacing with the cluster while the latter is responsible for launching the user code. That component would probably also act as a bridge to the "edge" runner. The main reason to seperate these, is that we want isolation of user code from ftl cluster internal components. We wouldn't want user code to be able to assume the capabilities of an FTL component.

Have we considered what a future FTL deployment, in a common production grade K8s deployment might look like? If the state of the art involves istio or other additional components, should we design for them now? Or at least make sure that we can interoperate with them?

@stuartwdouglas
Copy link
Contributor

I feel like, running VMs in place of containers directly would be a challenge in K8s. Some googling around seemed to indicate that it's doable, but would definitely be a specialized deployment strategy. I'd be concerned that this approach would get in the way of adoption of FTL.

All the approaches out there are fairly immature, and definitly specialized. I evaluated this a couple of years ago and ended up needed to write my own VM provisioner to support multi platform builds on kube rather than using existing systems. Things may have gotten better since then, but it is still not something that we could require.

If we aren't doing this, a sidecar model (a la envoy) makes sense to me. I guess this means that we'd have to split the runner image into 2 right? The "edge" runner sidecar and the main "workload" runner. The former is responsible for interfacing with the cluster while the latter is responsible for launching the user code. That component would probably also act as a bridge to the "edge" runner. The main reason to seperate these, is that we want isolation of user code from ftl cluster internal components. We wouldn't want user code to be able to assume the capabilities of an FTL component.

This is doable. We want to avoid image building by the user so it is slightly tricky, but it is doable. If we are going to require an OCI registry for artifacts anyway once possibility is to have the controller generate the image (not via a docker build, directly through the OCI registry). Another possibility is to have a shared volume between the sidecar and the runner container, and transfer the user code over the shared volume.

Have we considered what a future FTL deployment, in a common production grade K8s deployment might look like? If the state of the art involves istio or other additional components, should we design for them now? Or at least make sure that we can interoperate with them?

We definitely will need istio, we should be thinking about this.

@alecthomas
Copy link
Collaborator Author

Looping in @tlongwell-block for his thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Todo
Development

No branches or pull requests

3 participants