-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
criu: Add support for pidfds #2449
base: criu-dev
Are you sure you want to change the base?
Conversation
acac25e
to
729891a
Compare
8f702f6
to
c455eaa
Compare
f0e574a
to
a5c9e56
Compare
@bsach64 The following blog post provides information on how to write good commit messages: https://cbea.ms/git-commit/ Similar to the Linux kernel, we use the commit history to make it easier for reviewers (and maintainers) to understand the problem that motivated particular patch and how this problem is being addressed. |
Hello @rst0git! |
cc230c5
to
e2746eb
Compare
I am having a hard time understanding why the Vagrant Fedora Rawhide and Vagrant Fedora Rawhide (No VDSO) fail on the |
5b8610e
to
57cd2da
Compare
b41aa44
to
92308f7
Compare
@avagin Any update? :) |
@mihalicyn could you help with reviewing this pr? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hey @bsach64, if you have a time and interest, we can discuss support for threaded pidfds. |
yeah, sorry, I just forgot to press an "approve" button earlier. As I was following a whole development process of this thing there is no doubt that it's approved/reviewed from my side :-) |
Sure Alex :) |
We only use the last pid from the list in NSpid entry (from /proc/<pid>/fdinfo/<pidfd>) while restoring pidfds. The last pid refers to the pid of the process in the most deeply nested pid namespace. Since CRIU does not currently support nested pid namespaces, this entry is the one we want. After Linux 6.9, inode numbers can be used to compare pidfds. pidfds referring to the same process will have the same inode numbers. We use inode numbers to restore pidfds that point to dead processes. Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
5578e26
to
e572f5b
Compare
dead = lookup_dead_pidfd(info->pidfe->ino); | ||
BUG_ON(!dead); | ||
|
||
dead->count--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead
isn't shared between processes. If I don't miss something it can be updated from multiple processes. You need allocate these entries from a shared pool and to use some form of synchronization to ensure data consistency and prevent race conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, nice catch! Thanks!
@bsach64 you need to use shmalloc
function (just grep over the CRIU codebase). Also, we have a flag COLLECT_SHARED
you can pay attention to a different places in the CRIU code how we use it.
Idea is that you need to pre-allocate all the needed data-structures in a shared memory (usually, on the image collect stage before we start fork-ing processes), so then you can use this memory to pass data between different processes.
Feel free to reach me if you have any questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, Alex!
I will get back to you in a couple of days!
e572f5b
to
1f95f20
Compare
Process file descriptors (pidfds) were introduced to provide a stable handle on a process. They solve the problem of pid recycling. For a detailed explanation, see https://lwn.net/Articles/801319/ and http://www.corsix.org/content/what-is-a-pidfd Before Linux 6.9, anonymous inodes were used for the implementation of pidfds. So, we detect them in a fashion similiar to other fd types that use anonymous inodes by calling `readlink()`. After 6.9, pidfs (a file system for pidfds) was introduced. In 6.9 `S_ISREG()` returned true for pidfds, but this again changed with 6.10. (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/pidfs.c?h=v6.11-rc2#n285) After this change, pidfs inodes have no file type in st_mode in userspace. We use `PID_FS_MAGIC` to detect pidfds for kernel >= 6.9 Hence, check for pidfds occurs before the check for regular files. For pidfds that refer to dead processes, we lose the pid of the process as the Pid and NSpid fields in /proc/<pid>/fdinfo/<pidfd> change to -1. So, we create a temporary process for each unique inode and open pidfds that refer to this process. After all pidfds have been opened we kill this temporary process. This commit does not include support for pidfds that point to a specific thread, i.e pidfds opened with `PIDFD_THREAD` flag. Fixes: checkpoint-restore#2258 Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Ensures that entries in /proc/<pid>/fdinfo/<pidfd> are same. Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Ensure `pidfd_send_signal()` syscall works as expected after C/R. Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Validate that pidfds can been used to send signals to different processes after C/R using the `pidfd_send_signal()` syscall. Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
After, C/R of pidfds that point to dead processes their inodes might change. But if two pidfds point to same dead process they should continue to do so after C/R. This test ensures that this happens by calling `statx()` on pidfds after C/R and then comparing their inode numbers. Support for comparing pidfds by using `statx()` and inode numbers was introduced alongside pidfs. So if `f_type` of pidfd is not equal to `PID_FS_MAGIC` then we skip this test. signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
We get the read end of a pipe using `pidfd_getfd` and check if we can read from it after C/R. signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
We open a pidfd to a thread using `PIDFD_THREAD` flag and after C/R ensure that we can send signals using it with `PIDFD_SIGNAL_THREAD`. signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
1f95f20
to
56c8f7d
Compare
A PR for support of process file descriptors.
Here's a article that talks about features of pidfds, which we might have to support (and test):
http://www.corsix.org/content/what-is-a-pidfd
To Do:
pid
ofpidfd
is part of the process tree we are dumping. So, that we can ensure it exists on restore.pidfds
that point to dead processes.If the process for which a pidfd is open dies, the Pid and NSPid entries change to -1. So, we don't have the Pid necessary to recreate the process. We cannot just create a random process, open a pidfd to it, and kill it since pidfds opened for a specific process have the same inode number used to compare them.
criu/include/magic.h
pidfd_getfd
Support for
PIDFD_THREAD
(allows creation of a pidfd that points to a specific thread) not a part of this PR.Known Limitations:
CRIU doesn't support nested PID namespaces. As such we cannot C/R pidfd's that point to PIDs in nested namespaces. If we introduce support for nested PID namespaces we will have to rework some of this code.
This is work being done as part of Google Summer of Code 2024 (https://summerofcode.withgoogle.com/programs/2024/projects/vgpqxIxx)
A huge thank you to Alex @mihalicyn for guiding me and making this a wonderful experience :)