Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup: add PIDs cgroup controller support #446

Merged
merged 4 commits into from
Jan 12, 2016
Merged

cgroup: add PIDs cgroup controller support #446

merged 4 commits into from
Jan 12, 2016

Conversation

cyphar
Copy link
Member

@cyphar cyphar commented Dec 20, 2015

This is a fixed up version of the now-reverted #58. The main issue with that code was that some of the memory cgroup's limits weren't being set at all if you were using systemd. Unfortunately, I seem to be unable to test the systemd portion (and the systemd test cases for memory don't actually test that the limits cause a process to fail).

All other cgroups were unaffected by this bug (as their limits were set using .Set(), and were joined either explicitly or using systemd), regardless of whether they were unsupported, partial supported or fully supported by systemd.

Some testing has revealed that the old setup didn't work either (because getSubsystemPath doesn't appear to work as expected).

/cc @mrunalp @hqhq @vishh @crosbymichael @dqminh

@cyphar cyphar changed the title Add PIDs cgroup controller support cgroupAdd PIDs cgroup controller support Dec 20, 2015
@cyphar cyphar changed the title cgroupAdd PIDs cgroup controller support cgroup: add PIDs cgroup controller support Dec 20, 2015
@cyphar
Copy link
Member Author

cyphar commented Dec 20, 2015

@mrunalp The main change I made to #58 was to actually explicitly join the cgroups and set memory. In addition, there we a couple of subtle bugs (needed to move the CheckCpushares check). After building Docker with my patches, systemd support appears to work (it doesn't work on master right now).

It should be noted that the "fix" for getting systemd to "work" is to just use the fs driver for basically everything. Unfortunately, there's deeper systemd issues here. At least it all works now.

@cyphar
Copy link
Member Author

cyphar commented Dec 22, 2015

@hqhq
Copy link
Contributor

hqhq commented Dec 22, 2015

What problems are you saying about systemd cgroup in docker upstream exactly? I'm sure we have some systemd cgroup users and we don't see reports from them. And what's the problem about getSubsystemPath in systemd cgroup? Can you elaborate?

@cyphar
Copy link
Member Author

cyphar commented Dec 22, 2015

@hqhq Well, if you try to use the version on master and set something like --cpu-shares you get errors when the container tries to start. There's also a few issues we've seen with customers where systemd randomly starts moving processes around slices.

@hqhq
Copy link
Contributor

hqhq commented Dec 23, 2015

@cyphar No I don't get any errors when setting --cpu-shares with the latest docker version on master, my box is centos 7.1 with systemd-219, but I do know there might be problems with some very new systemd versions, what systemd version are you using.

BTW, I don't have a box with latest kernel to test this PR out, but over all the codes in this PR looks good, if there are any systemd issues with docker, I can help diagnose.

@cyphar
Copy link
Member Author

cyphar commented Dec 23, 2015

@hqhq I'm testing this on Arch (with a custom-compiled kernel). The systemd problems I'm referring to are not related to this PR (there are a few). I'll open a separate issue (or PR) for these problems if appropriate (a SUSE customer has seen issues with systemd on SLE).

@cyphar
Copy link
Member Author

cyphar commented Jan 6, 2016

I've rebased (which was annoying, as I just discovered the git revert commits love to mess with git rebase). PTAL, the systemd issues have been resolved and we should merge this soon if we want PIDs cgroup support in Docker 1.10.

/ping @mrunalp @hqhq @vishh @crosbymichael @dqminh

@mrunalp
Copy link
Contributor

mrunalp commented Jan 6, 2016

Do you have a docker branch with this integrated?

Sent from my iPhone

On Jan 6, 2016, at 5:38 AM, Aleksa Sarai notifications@github.com wrote:

PTAL /cc @mrunalp @hqhq @vishh @crosbymichael @dqminh


Reply to this email directly or view it on GitHub.

@LK4D4
Copy link
Contributor

LK4D4 commented Jan 6, 2016

@mrunalp moby/moby#18697

@LK4D4
Copy link
Contributor

LK4D4 commented Jan 6, 2016

I tried with docker. It works perfectly with fs cgroups. But seems like systemd cgroups doesn't work at all in master, so I can't test it.

@mrunalp
Copy link
Contributor

mrunalp commented Jan 6, 2016

@LK4D4 Here is a fix for systemd in docker moby/moby#19149

@cyphar
Copy link
Member Author

cyphar commented Jan 7, 2016

@LK4D4 @mrunalp note that you'll have to re-run hack/vendor.sh, since @jfrazelle hasn't updated her branch yet.

@LK4D4
Copy link
Contributor

LK4D4 commented Jan 7, 2016

I tested definitely with this branch :)

@cyphar
Copy link
Member Author

cyphar commented Jan 7, 2016

Good to hear. 😸

@LK4D4
Copy link
Contributor

LK4D4 commented Jan 7, 2016

I tested and it works with systemd.

@LK4D4
Copy link
Contributor

LK4D4 commented Jan 8, 2016

LGTM

@cyphar
Copy link
Member Author

cyphar commented Jan 11, 2016

Due to some upstream kernel changes introduced in v4.4 (namely, allowing cgroups to track zombies) the tests that assume a small pids.max will work are broken. However, there also appears to be a difference in the functionality (there appears to be a near-constant overhead of about 3. This is probably caused by the v4.4 stuff, but I'm not sure why this overhead isn't resolved almost immediately).

Ultimately, we do the best we can. Anything else is either due to the fact we use Go (which isn't going to change soon) or are potential kernel bugs (gulp).

NOTE: This doesn't actually affect the driver code, just the tests.

/cc @LK4D4 @mrunalp

@cyphar
Copy link
Member Author

cyphar commented Jan 11, 2016

In any case, PTAL so we can merge this soon.

/ping @mrunalp @hqhq @vishh @crosbymichael @dqminh

@mrunalp
Copy link
Contributor

mrunalp commented Jan 11, 2016

@cyphar Yes, I am going to test this today. Thanks!

t.Fatalf("expected fork() to succeed with permissive pids limit")
}

// Enforce a restrictive limit limit (shell + 6 * true + 3).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/limit limit/limit/

@mrunalp
Copy link
Contributor

mrunalp commented Jan 11, 2016

I tested and it worked fine for me. Besides the nits, LGTM

Add support for the pids cgroup controller to libcontainer, a recent
feature that is available in Linux 4.3+.

Unfortunately, due to the init process being written in Go, it can spawn
an an unknown number of threads due to blocked syscalls. This results in
the init process being unable to run properly, and thus small pids.max
configs won't work properly.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
Apply and Set are two separate operations, and it doesn't make sense to
group the two together (especially considering that the bootstrap
process is added to the cgroup as well). The only exception to this is
the memory cgroup, which requires the configuration to be set before
processes can join.

One of the weird cases to deal with is systemd. Systemd sets some of the
cgroup configuration options, but not all of them. Because memory is a
special case, we need to explicitly set memory in the systemd Apply().
Otherwise, the rest can be safely re-applied in .Set() as usual.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
It is vital to loudly fail when a user attempts to set a cgroup limit
(rather than using the system default). Otherwise the user will assume
they have security they do not actually have. This mirrors the original
Apply() (that would set cgroup configs) semantics.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
Due to the fact that the init is implemented in Go (which seemingly
randomly spawns new processes and loves eating memory), most cgroup
configurations are required to have an arbitrary minimum dictated by the
init. This confuses users and makes configuration more annoying than it
should. An example of this is pids.max, where Go spawns multiple
processes that then cause init to violate the pids cgroup constraint
before the container can even start.

Solve this problem by setting the cgroup configurations as late as
possible, to avoid hitting as many of the resources hogged by the Go
init as possible. This has to be done before seccomp rules are applied,
as the parent and child must synchronise in order for the parent to
correctly set the configurations (and writes might be blocked by seccomp).

Signed-off-by: Aleksa Sarai <asarai@suse.com>
@cyphar
Copy link
Member Author

cyphar commented Jan 11, 2016

@mrunalp Nits addressed.

@mrunalp
Copy link
Contributor

mrunalp commented Jan 12, 2016

@cyphar Thanks! LGTM

mrunalp pushed a commit that referenced this pull request Jan 12, 2016
cgroup: add PIDs cgroup controller support
@mrunalp mrunalp merged commit 4c767d7 into opencontainers:master Jan 12, 2016
@cyphar cyphar deleted the 18-add-pids-controller branch January 12, 2016 01:13
@jimmidyson jimmidyson mentioned this pull request Jan 12, 2016
@yijia2413
Copy link

great!!

stefanberger pushed a commit to stefanberger/runc that referenced this pull request Sep 8, 2017
Remove one JSON related MUST requirement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants