-
Notifications
You must be signed in to change notification settings - Fork 885
rkt: kernel: uidshifts at mount time #1057
Comments
Just dumping some thoughts while working on this previously: The idea behind this feature is to allow mounting the same volume multiple times but apply different UID-mappings to each mount. For example, this allows mounting To support such mappings, we must make them a property of a bind-mount (more precisely, a property of In the kernel, all metadata that is transferred between file-systems and the kernel is done via
Those functions would map the kuid_t according to the rules in the vfsmount. To make sure no-one accesses If this infrastructure is in place, you can be sure that all UIDs are always correctly mapped. No need to fiddle around with Right now, a lot of code in the kernel does not have a What I would do (and which I partly did so far), is to change I think that's it so far. Feel free to setup a public github repo and I'll help out! |
@dvdhrm thanks for the write-up!
In order to avoid implementing it in every possible filesystem on Linux, but still have something correct, would it help to add a flag on In this way, an initial patch could be accepted upstream without modifying most filesystems. It would be similar to how FS_USERNS_MOUNT works for disallowing new mounts from most filesystems in a user namespace.
Is your boolean option temporary or do you want to keep that API? What's the name of the option? How should it work with stackable uid mapping (with stackable user namespaces)? What happens if you bind mount a source directory which already had a mapping? Do you stack the two mappings? What if you bind mount recursively If the uid mappings are not stackable, how to prevent a process in a user namespace to revert the effect of the uid mapping by replacing the mapping by another one (or by an empty mapping)? |
You cannot do that. You really have to change However, what you can do is to make all those filesystems pass NULL as
Temporary. I really think the API should re-use the Maybe the logic I described will work out in the end. But I haven't thought it through. This is really just for simple 1-layer testing. No stacking involved, etc as it is, imho, an orthogonal problem. |
@tixxdz added some kernel selftests (still not complete) in this branch: |
Some updates, currently we have a blocker: In commit systemd/linux@a628126 of the test branch we pass the vfsmount context to generic_fillattr() so we can do the uidshift mapping inside that function and let all the vfs functions and other filesystems take advantage of it, however later in all the stat64(), lstat64(), fstat64() syscalls and others, the uid mapping is overwritten in cp_new_stat64() http://lxr.free-electrons.com/source/fs/stat.c#L363. cp_new_stat64() is one of the last functions that is called to copy the stat struct to user space, inside that function the uid and gid fields are mapped again to the current_user_ns(). What we want is that the uid and gid fields should stay mapped into the userns that was pinned during bind mount. Solution 1): |
This issue is a followup of the UID shifts for the rootfs from the "Investigation user namespaces" issue #986.
In order to take full advantage of user namespaces in rkt we are planning to have uidshifts during mounts. Doing the uidshift on the rootfs as a mount option will allow to run daemons as unprivileged with the dynamically assigned uids and without leaking into the persistent file system.
Model
Background
When a rootfs tree is used by overlayfs, we would need some vfs_uid= shifting option.
This was already mention in this lwn article "UID/GID identity and filesystems" http://lwn.net/Articles/637431/
https://lists.linux-foundation.org/pipermail/containers/2014-June/034630.html
Implementation:
1.1) A generic vfs mount option ?
mount(source, target, “bind”, MS_BIND|MS_REMOUNT, vfs_uidshifts_data)
1.2) An overlayfs mount option or a completely new overlayfs-like fs
Some file systems like NFS are already doing some mapping.
The text was updated successfully, but these errors were encountered: