Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

criu: Add support for pidfds #2449

Open
wants to merge 8 commits into
base: criu-dev
Choose a base branch
from

Conversation

bsach64
Copy link
Contributor

@bsach64 bsach64 commented Jul 21, 2024

A PR for support of process file descriptors.

Here's a article that talks about features of pidfds, which we might have to support (and test):
http://www.corsix.org/content/what-is-a-pidfd

To Do:

  • Check if the pid of pidfd is part of the process tree we are dumping. So, that we can ensure it exists on restore.
  • Add support for pidfds that point to dead processes.
    If the process for which a pidfd is open dies, the Pid and NSPid entries change to -1. So, we don't have the Pid necessary to recreate the process. We cannot just create a random process, open a pidfd to it, and kill it since pidfds opened for a specific process have the same inode number used to compare them.
  • Change the magic in criu/include/magic.h
  • test for pidfd_getfd
  • Make CI green
  • Check functionality before and after 6.9
  • Better commit messages (add related issues that are fixed as a result of this PR)

Support for PIDFD_THREAD (allows creation of a pidfd that points to a specific thread) not a part of this PR.

Known Limitations:
CRIU doesn't support nested PID namespaces. As such we cannot C/R pidfd's that point to PIDs in nested namespaces. If we introduce support for nested PID namespaces we will have to rework some of this code.

This is work being done as part of Google Summer of Code 2024 (https://summerofcode.withgoogle.com/programs/2024/projects/vgpqxIxx)
A huge thank you to Alex @mihalicyn for guiding me and making this a wonderful experience :)

@mihalicyn mihalicyn self-requested a review July 22, 2024 11:31
@mihalicyn mihalicyn marked this pull request as ready for review July 23, 2024 20:01
@mihalicyn mihalicyn marked this pull request as draft July 23, 2024 20:03
images/pidfd.proto Outdated Show resolved Hide resolved
criu/include/magic.h Outdated Show resolved Hide resolved
criu/pidfd.c Outdated Show resolved Hide resolved
criu/pidfd.c Outdated Show resolved Hide resolved
criu/pidfd.c Outdated Show resolved Hide resolved
@bsach64 bsach64 force-pushed the pidfd-support branch 6 times, most recently from acac25e to 729891a Compare July 29, 2024 20:21
@bsach64 bsach64 force-pushed the pidfd-support branch 3 times, most recently from 8f702f6 to c455eaa Compare August 2, 2024 18:32
@rst0git rst0git mentioned this pull request Aug 3, 2024
@bsach64 bsach64 force-pushed the pidfd-support branch 4 times, most recently from f0e574a to a5c9e56 Compare August 8, 2024 16:12
@rst0git
Copy link
Member

rst0git commented Aug 9, 2024

Better commit messages (add related issues that are fixed as a result of this PR)

@bsach64 The following blog post provides information on how to write good commit messages: https://cbea.ms/git-commit/

Similar to the Linux kernel, we use the commit history to make it easier for reviewers (and maintainers) to understand the problem that motivated particular patch and how this problem is being addressed.

@bsach64
Copy link
Contributor Author

bsach64 commented Aug 11, 2024

Better commit messages (add related issues that are fixed as a result of this PR)

@bsach64 The following blog post provides information on how to write good commit messages: https://cbea.ms/git-commit/

Similar to the Linux kernel, we use the commit history to make it easier for reviewers (and maintainers) to understand the problem that motivated particular patch and how this problem is being addressed.

Hello @rst0git!
I have updated the commit messages and have tried to follow the rules stated here: https://cbea.ms/git-commit/

criu/proc_parse.c Outdated Show resolved Hide resolved
@bsach64 bsach64 force-pushed the pidfd-support branch 2 times, most recently from cc230c5 to e2746eb Compare August 12, 2024 09:45
@bsach64
Copy link
Contributor Author

bsach64 commented Aug 12, 2024

I am having a hard time understanding why the Vagrant Fedora Rawhide and Vagrant Fedora Rawhide (No VDSO) fail on the pidfd_child test.
Can I get some help regarding this?

@bsach64 bsach64 force-pushed the pidfd-support branch 6 times, most recently from 5b8610e to 57cd2da Compare August 28, 2024 12:26
@bsach64 bsach64 marked this pull request as ready for review August 28, 2024 13:03
@bsach64 bsach64 force-pushed the pidfd-support branch 2 times, most recently from b41aa44 to 92308f7 Compare August 29, 2024 14:34
@avagin avagin self-requested a review August 29, 2024 16:10
@bsach64
Copy link
Contributor Author

bsach64 commented Sep 13, 2024

@avagin Any update? :)

@avagin
Copy link
Member

avagin commented Sep 14, 2024

@mihalicyn could you help with reviewing this pr?

criu/pidfd.c Outdated Show resolved Hide resolved
Copy link
Member

@mihalicyn mihalicyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mihalicyn
Copy link
Member

Hey @bsach64, if you have a time and interest, we can discuss support for threaded pidfds.
Next week I'll be on LPC 2024, but just after that I'll be happy chat with you.

@mihalicyn
Copy link
Member

@mihalicyn could you help with reviewing this pr?

yeah, sorry, I just forgot to press an "approve" button earlier. As I was following a whole development process of this thing there is no doubt that it's approved/reviewed from my side :-)

@bsach64
Copy link
Contributor Author

bsach64 commented Sep 15, 2024

Hey @bsach64, if you have a time and interest, we can discuss support for threaded pidfds.

Next week I'll be on LPC 2024, but just after that I'll be happy chat with you.

Sure Alex :)
Please send me an email, whenever you are free!

We only use the last pid from the list in NSpid entry (from
/proc/<pid>/fdinfo/<pidfd>) while restoring pidfds.
The last pid refers to the pid of the process in the most deeply nested
pid namespace. Since CRIU does not currently support nested pid
namespaces, this entry is the one we want.

After Linux 6.9, inode numbers can be used to compare pidfds. pidfds
referring to the same process will have the same inode numbers. We use
inode numbers to restore pidfds that point to dead processes.

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
@bsach64 bsach64 force-pushed the pidfd-support branch 2 times, most recently from 5578e26 to e572f5b Compare September 15, 2024 09:54
dead = lookup_dead_pidfd(info->pidfe->ino);
BUG_ON(!dead);

dead->count--;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead isn't shared between processes. If I don't miss something it can be updated from multiple processes. You need allocate these entries from a shared pool and to use some form of synchronization to ensure data consistency and prevent race conditions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, nice catch! Thanks!

@bsach64 you need to use shmalloc function (just grep over the CRIU codebase). Also, we have a flag COLLECT_SHARED you can pay attention to a different places in the CRIU code how we use it.

Idea is that you need to pre-allocate all the needed data-structures in a shared memory (usually, on the image collect stage before we start fork-ing processes), so then you can use this memory to pass data between different processes.

Feel free to reach me if you have any questions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, Alex!
I will get back to you in a couple of days!

Process file descriptors (pidfds) were introduced to provide a stable
handle on a process. They solve the problem of pid recycling.

For a detailed explanation, see https://lwn.net/Articles/801319/ and
http://www.corsix.org/content/what-is-a-pidfd

Before Linux 6.9, anonymous inodes were used for the implementation of
pidfds. So, we detect them in a fashion similiar to other fd types that
use anonymous inodes by calling `readlink()`.
After 6.9, pidfs (a file system for pidfds) was introduced.
In 6.9 `S_ISREG()` returned true for pidfds, but this again changed with
6.10.
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/pidfs.c?h=v6.11-rc2#n285)
After this change, pidfs inodes have no file type in st_mode in
userspace.
We use `PID_FS_MAGIC` to detect pidfds for kernel >= 6.9
Hence, check for pidfds occurs before the check for regular files.

For pidfds that refer to dead processes, we lose the pid of the process
as the Pid and NSpid fields in /proc/<pid>/fdinfo/<pidfd> change to -1.
So, we create a temporary process for each unique inode and open pidfds
that refer to this process. After all pidfds have been opened we kill
this temporary process.

This commit does not include support for pidfds that point to a specific
thread, i.e pidfds opened with `PIDFD_THREAD` flag.

Fixes: checkpoint-restore#2258

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Ensures that entries in /proc/<pid>/fdinfo/<pidfd> are same.

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Ensure `pidfd_send_signal()` syscall works as expected after C/R.

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Validate that pidfds can been used to send signals to different
processes after C/R using the `pidfd_send_signal()` syscall.

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
After, C/R of pidfds that point to dead processes their inodes might
change. But if two pidfds point to same dead process they should
continue to do so after C/R.

This test ensures that this happens by calling `statx()` on pidfds after
C/R and then comparing their inode numbers.

Support for comparing pidfds by using `statx()` and inode numbers was
introduced alongside pidfs. So if `f_type` of pidfd is not equal to
`PID_FS_MAGIC` then we skip this test.

signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
We get the read end of a pipe using `pidfd_getfd` and check if we can
read from it after C/R.

signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
We open a pidfd to a thread using `PIDFD_THREAD` flag and after C/R
ensure that we can send signals using it with `PIDFD_SIGNAL_THREAD`.

signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants