Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for private CA on the Agent side #3494

Merged
merged 6 commits into from
Jul 26, 2023

Conversation

dbutenhof
Copy link
Member

PBENCH-1209

The staging server and local development servers now use HTTPS with certs signed by a private pbench CA. An HTTPS connection can't be validated without a reference to this CA.

The primary change here is in the contrib/containerized-pbench/pbench script which now looks for a private CA definition, maps the CA bundle file into the container, and defines REQUESTS_CA_BUNDLE within the container.

In an attempt to handle RPM installs, there's also logic to support a new [results] section pbench_ca configuration variable to define a CA path that will be used to verify the PUT to a server.

Note, this isn't being used for the --relay path, as that's currently not using https and we'll need to figure out how we want to configure this in the future.

@dbutenhof dbutenhof self-assigned this Jul 11, 2023
@dbutenhof dbutenhof marked this pull request as draft July 12, 2023 11:58
ndokos
ndokos previously approved these changes Jul 12, 2023
@dbutenhof
Copy link
Member Author

I'm taking this out of draft mode, because it's probably worth moving forward, even though I'm also not entirely satisfied. For convenience, here's the text of my second commit:

I'm not entirely sure how far I want to take this. What I have so far allows a
user to use the pbench script to run a Pbench Agent command with a specified
CA bundle against a "runlocal" or the staging server by specifying the CA file
with export PB_AGENT_CA=server/pbenchinacan/etc/pki/tls/certs/pbench_CA.crt
before using ./contrib/containerized-pbench/pbench pbench-results-move.

If you use --server or --relay you don't need to worry about having a
~/.config/pbench/pbench-agent.cfg file.

The remaining "problem" is that if you don't want to use --server and expect
to get your configuration from ~/.config/pbench/pbench-agent.cfg, even if it
has the new server_ca configuration value set, there's no logic to cause the
CA to be mapped into the container so it won't actually work. (That is, you
still need the PB_AGENT_CA environment variable.) I find this unsatisfying;
but any solution I can see (e.g., unconditionally mapping in the private CA if
we can find it, or building a mechanism to look for [results] server_ca in
the config file, given there's no standard config-parser CLI available outside
the container) at least equally unsatisfying.

So for now, this is at least a hack allowing us to use the pbench container
script against staging and runlocal servers. Maybe that's enough.

@dbutenhof dbutenhof requested a review from ndokos July 14, 2023 12:33
contrib/containerized-pbench/pbench Outdated Show resolved Hide resolved
@ndokos
Copy link
Member

ndokos commented Jul 17, 2023

The server functional tests are failing with a traceback in TestInventory.test_compare.

Odd that unit tests didn't see this. It's possible this is due to the new pquisby 0.0.9, which I believe was published to fix Varshini's comparison problems ... except I'm not finding the relevant code in the GitHub project to compare. 😦

ndokos
ndokos previously approved these changes Jul 17, 2023
@dbutenhof dbutenhof requested a review from webbnh July 18, 2023 11:51
@dbutenhof
Copy link
Member Author

The server functional tests are failing with a traceback in TestInventory.test_compare.

Odd that unit tests didn't see this. It's possible this is due to the new pquisby 0.0.9, which I believe was published to fix Varshini's comparison problems ... except I'm not finding the relevant code in the GitHub project to compare. frowning

We're now up to pquisby 0.0.11, which appears to fix the problem.

Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm requesting changes mostly because of the placeholder in the pbench-agent.cfg file, and partly for the fallback verify= value (and slightly for the readlink usage). However, I have some concerns about this change in general and I would like to offer a different approach.

One of the thrusts of the contrib/containerized-pbench/pbench script is that it provides for a "no-install" invocation of the Agent: you can download a single file, and run it without doing an RPM install or creating a Git checkout. Entangling it with the CA file doubles the complication.

Moreover, my/our expectation is that access to the Pbench Server will not require the Pbench private CA once we have TLS deployed in production, so an external user will not require the CA file. Nevertheless, it's quite true that access to the Staging server or to a development server will require the private CA, and only the user will know which of those they are targeting with the pbench script, so there does need to be some sort of "knob" which the user can employ to tell the script which configuration it is running in.

But, that knob does not necessarily have to be PB_AGENT_CA as presented, and there is an alternative to mapping a user-provided CA into the container (which is, I think, the root of my concerns): we can build the CA into the container (it is very small, relative to the size of the container, so the impact of doing so will be imperceptible). Then all we need is for the script to set REQUESTS_CA_BUNDLE inside the container, if the user requests it.

FWIW, rather than extending the code in the pbench script which tries to use or generate a local config file, I would prefer to see it torn out! The principal purpose of that was to allow the user to specify the Pbench Server "on the fly", but, with the wonderful --server switch that you added, we no longer need a config file to accomplish that. So, I recommend removing all that code -- if the user needs to use a custom Agent config file, s/he can use PB_AGENT_PODMAN_OPTIONS to map one over the default config file inside the container.

agent/config/pbench-agent.cfg Outdated Show resolved Hide resolved
lib/pbench/agent/results.py Outdated Show resolved Hide resolved
contrib/containerized-pbench/pbench Outdated Show resolved Hide resolved
contrib/containerized-pbench/pbench Show resolved Hide resolved
contrib/containerized-pbench/pbench Outdated Show resolved Hide resolved
Copy link
Member Author

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, that knob does not necessarily have to be PB_AGENT_CA as presented, and there is an alternative to mapping a user-provided CA into the container (which is, I think, the root of my concerns): we can build the CA into the container (it is very small, relative to the size of the container, so the impact of doing so will be imperceptible). Then all we need is for the script to set REQUESTS_CA_BUNDLE inside the container, if the user requests it.

I really don't like the idea of building our hack development CA into the real agent container. Frankly a "product deployed" agent shouldn't be able to trust our staging server. And while we could add another layer to the "real" agent container for development, that's a difference and complication I'd rather avoid.

FWIW, rather than extending the code in the pbench script which tries to use or generate a local config file, I would prefer to see it torn out!

I very nearly did that along the way, but I felt reluctant to tear out all your hard work so I struggled to retain it. 😆

agent/config/pbench-agent.cfg Outdated Show resolved Hide resolved
contrib/containerized-pbench/pbench Outdated Show resolved Hide resolved
contrib/containerized-pbench/pbench Show resolved Hide resolved
contrib/containerized-pbench/pbench Outdated Show resolved Hide resolved
lib/pbench/agent/results.py Outdated Show resolved Hide resolved
@webbnh
Copy link
Member

webbnh commented Jul 19, 2023

I really don't like the idea of building our hack development CA into the real agent container.

Why? Simply including it is not a security hazard, and it will increase the size of the image by about 0.00006%.

Frankly a "product deployed" agent shouldn't be able to trust our staging server.

Again, why? I'm guessing that using our CA will not be the default (because the default should be to target the production Server or to require the user to explicitly select a Server to target), and so this would just be an unused option in that case.

And while we could add another layer to the "real" agent container for development, that's a difference and complication I'd rather avoid.

I agree -- and it's not worth adding a layer for just 1.2Kb.

FWIW, rather than extending the code in the pbench script which tries to use or generate a local config file, I would prefer to see it torn out!

I very nearly did that along the way, but I felt reluctant to tear out all your hard work so I struggled to retain it. 😆

Thank you -- I very much appreciate that sentiment -- but, while the config file is an excellent solution on the Server side, I don't feel like it is a good one on the client/Agent side, and, now that we have --server, I would like to see the Agent config file disappear...so feel free to excise it from here! 😀

Copy link
Member Author

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like the idea of building our hack development CA into the real agent container.

Why? Simply including it is not a security hazard, and it will increase the size of the image by about 0.00006%.

It feels very much like including testing code in the product: it's impure, and exposes what should be internal. It opens the risk that REQUESTS_CA_BUNDLE=/etc/pki/tls/certs will trust something that shouldn't be trusted. The idea makes me extremely uncomfortable, and I can't convince myself that it's "innocuous".

I very nearly did that along the way, but I felt reluctant to tear out all your hard work so I struggled to retain it. 😆

Thank you -- I very much appreciate that sentiment -- but, while the config file is an excellent solution on the Server side, I don't feel like it is a good one on the client/Agent side, and, now that we have --server, I would like to see the Agent config file disappear...so feel free to excise it from here! 😀

Yeah: done. That closes out the discussion about virtual tabbing and HERE files, too. (I have GitHub set to 4 space tabs just like vscode, by the way, so it looks fine to me: and I don't see any advantage in a different setting since it still doesn't ensure a wayward tab is evident.)

agent/config/pbench-agent.cfg Outdated Show resolved Hide resolved
@dbutenhof
Copy link
Member Author

Why? Simply including it is not a security hazard, and it will increase the size of the image by about 0.00006%.

It feels very much like including testing code in the product: it's impure, and exposes what should be internal. It opens the risk that REQUESTS_CA_BUNDLE=/etc/pki/tls/certs will trust something that shouldn't be trusted. The idea makes me extremely uncomfortable, and I can't convince myself that it's "innocuous".

Final thought before I leave to hike... it occurs to me that this would bother me a bit less if we packaged it in our own file tree rather than in a standard system location, like /opt/pbench-agent/config/develop/pbench-devel-CA.crt where it's highly improbable that it could be referenced accidentally and where the path and name makes the purpose more obvious.

Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like the idea of building our hack development CA into the real agent container.

It feels very much like including testing code in the product: it's impure, and exposes what should be internal.

I'll grant that. But, it's a very small impurity (of which it is far from the first...), and it enables substantial simplifications for the user. So, I think it is a good trade-off.

It opens the risk that REQUESTS_CA_BUNDLE=/etc/pki/tls/certs will trust something that shouldn't be trusted.

That's a fair point. (See below.)

now that we have --server, I would like to see the Agent config file disappear...so feel free to excise it from here! 😀

Yeah: done.

🎉

That closes out the discussion about virtual tabbing and HERE files, too.

🎊

(I have GitHub set to 4 space tabs just like vscode, by the way, so it looks fine to me: and I don't see any advantage in a different setting since it still doesn't ensure a wayward tab is evident.)

I was unaware of that setting. (However, I'm not sure I would like that better.)

it occurs to me that this would bother me a bit less if we packaged it in our own file tree rather than in a standard system location, like /opt/pbench-agent/config/develop/pbench-devel-CA.crt where it's highly improbable that it could be referenced accidentally and where the path and name makes the purpose more obvious.

I would be happy with that solution. I only put it in /etc/pki/tls/certs because I needed to put it somewhere and that seemed like a conventional location. (I was thinking that we would actually trust it in test configurations, but never carried through on that; but, IIUC, trusting it requires having it in there.)

As for what to name the CA cert file, we could go with pbench-devel-CA.crt or pbench-private-CA.crt, but I don't see much advantage to those over the current pbench_CA.crt.

Note that currently, we use the private CA:

  • in func-test to connect between the Server and Keycloak and between the client and the "inner" Nginx proxy; and,
  • in Staging to connect between the client and the "inner" proxy.

Once we move the Staging server behind the IntLab "outer" proxy, we won't use the CA in staging at all (just as we won't in production) -- instead, we'll use a cert signed by the IntLab "intermediate CA" -- so access to the private CA will only be an issue for func-test and development.

agent/config/pbench-agent.cfg Outdated Show resolved Hide resolved
ndokos
ndokos previously approved these changes Jul 21, 2023
Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the usage function is missing from the container_build.sh script. And, I'm requesting the ability to override the definition of WORKSPACE_TMP, because I actually do this in practice.

The rest of the comments are nits and small things.

agent/containers/images/container_build.sh Outdated Show resolved Hide resolved
agent/containers/images/container_build.sh Outdated Show resolved Hide resolved
agent/containers/images/container_build.sh Outdated Show resolved Hide resolved
agent/containers/images/container_build.sh Show resolved Hide resolved
agent/containers/images/container_build.sh Outdated Show resolved Hide resolved
agent/containers/images/container_build.sh Outdated Show resolved Hide resolved
contrib/containerized-pbench/pbench Outdated Show resolved Hide resolved
contrib/containerized-pbench/pbench Outdated Show resolved Hide resolved
contrib/containerized-pbench/pbench Show resolved Hide resolved
Comment on lines 406 to +415
if server:
path = config.get("results", "rest_endpoint")
uri = f"{server}/{path}"
self.ca = None
else:
uri = config.get("results", "server_rest_url")

# If the "server_ca" config variable isn't defined, we expect to verify
# using a registered CA.
self.ca = config.get("results", "server_ca", fallback=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code up-to-date with our current thinking?

As the code now stands (AFAICT), if we specify server, then we ignore the server_ca option in the config file, and we'll depend on REQUESTS_CA_BUNDLE if an override is needed; if, instead, we pull the uri from the config file, then we'll honor server_ca if it is provided...why the bifurcation? Why not use the config file CA value in both cases if it is provided? Or, why have the server_ca option at all when the user can use REQUESTS_CA_BUNDLE?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config file defines a server and optionally a CA to go with that server. If you don't create a custom config file and use --server, there's not going to be a server_ca anyway. But if you have a config file, even with a server_ca, we can't necessarily assume that the --server target cert uses that CA.

So basically, config file server goes with config file CA; while --server goes with REQUESTS_CA_BUNDLE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear why we need this change and why it's better than just relying on REQUESTS_CA_BUNDLE for both cases. 🤷

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So your idea is that if someone installs the Agent RPM and relies on a server configured in pbench-agent.cfg so they can just pbench-results-move (e.g., the passthrough server) we should still require them to put a customized REQUESTS_CA_BUNDLE in the environment?

Granted, in the long run, we don't want to force users to rely on private CA certs at all, but this still seems worth doing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that, in production, access to the Pbench Server will require only publicly-available and trusted CAs (for some definition of "public"...e.g., "Red Hat internal", perhaps...), and so no specific action (e.g., defining REQUESTS_CA_BUNDLE) on the part of the user would be required.

For Staging, we should be operating in a similar environment (i.e., behind the IntLab proxies and using IntLab-signed or RH-IT-signed certs).

So, it's only for func-test and development that we would need to be using our private CA. "Normal users" wouldn't need it, unless they are working with bits that are too hot for Staging.

And, I think we have func-test already covered. So, it all really boils down to how do we support our developers. And, for that, I think we can rely on defining REQUESTS_CA_BUNDLE (and, we can probably do it implicitly inside a wrapper script or two, so it doesn't have to be reflected at all in the actual Agent or Server code).

PBENCH-1209

The staging server and local development servers now use HTTPS with certs
signed by a private pbench CA. An HTTPS connection can't be validated without
a reference to this CA.

The primary change here is in the `contrib/containerized-pbench/pbench` script
which now looks for a private CA definition, maps the CA bundle file into the
container, and defines `REQUESTS_CA_BUNDLE` within the container.

In an attempt to handle RPM installs, there's also logic to support a new
`[results]` section `pbench_ca` configuration variable to define a CA path
that will be used to verify the `PUT` to a server.

Note, this isn't being used for the `--relay` path, as that's currently not
using `https` and we'll need to figure out how we want to configure this in
the future.
I'm not entirely sure how far I want to take this. What I have so far allows a
user to use the `pbench` script to run a Pbench Agent command with a specified
CA bundle against a "runlocal" or the staging server by specifying the CA file
with `export PB_AGENT_CA=server/pbenchinacan/etc/pki/tls/certs/pbench_CA.crt`
before using `./contrib/containerized-pbench/pbench pbench-results-move`.

If you use `--server` or `--relay` you don't need to worry about having a
`~/.config/pbench/pbench-agent.cfg` file.

The remaining "problem" is that if you don't want to use `--server` and expect
to get your configuration from `~/.config/pbench/pbench-agent.cfg`, even if it
has the new `server_ca` configuration value set, there's no logic to cause the
CA to be mapped into the container so it won't actually work. (That is, you
still need the `PB_AGENT_CA` environment variable.) I find this unsatisfying;
but any solution I can see (e.g., unconditionally mapping in the private CA if
we can find it, or building a mechanism to look for `[results] server_ca` in
the config file, given there's no standard config-parser CLI available outside
the container) at least equally unsatisfying.

So for now, this is at least a hack allowing us to use the `pbench` container
script against staging and runlocal servers. Maybe that's enough.
Drop the attempt to create and manage a pbench-agent.cfg file.
Might as well default `verify=None` everywhere.
Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🦐

Comment on lines 406 to +415
if server:
path = config.get("results", "rest_endpoint")
uri = f"{server}/{path}"
self.ca = None
else:
uri = config.get("results", "server_rest_url")

# If the "server_ca" config variable isn't defined, we expect to verify
# using a registered CA.
self.ca = config.get("results", "server_ca", fallback=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear why we need this change and why it's better than just relying on REQUESTS_CA_BUNDLE for both cases. 🤷

@pravins
Copy link
Member

pravins commented Jul 25, 2023

I am just curious, can we use --server option to get rid of pbench-agent.conf file? May be out of context on this PR

@webbnh
Copy link
Member

webbnh commented Jul 25, 2023

I am just curious, can we use --server option to get rid of pbench-agent.cfg file?

I had the very same thought, but, when I started to look at it, it became clear that the answer is "no". If/when we get to using conventional packaging (a la the Fedora rules), we get much closer, but even then the answer might well be "no".

pbench-agent.cfg is just the user-configurable end (i.e., the overrides) of pbench-agent-default.cfg, where there is quite a bit of configuration. It's true that the biggest config item is the Server location, and we can now specify that via command line options, so there is little need for pbench-agent.cfg itself, other than as a mechanism for the Agent to load the rest of the configuration from pbench-agent-default.cfg...so, perhaps we could separate the two, but we still need the ability to specify the configuration or at least the parts of it which aren't custom to a given user.

May be out of context on this PR

Definitely. 😀

But, we probably should be considering how we specify the access schema for the Pbench Server -- having that in pbench-agent-defaults.cfg instead of inside the pbench_web_server value is inconvenient, although, I suppose we could just have people override the server_rest_url in pbench-agent.cfg as well. (I believe that pbench_web_server is used only to define server_rest_url and host_info_url, and I don't think that host_info_url is used after 0.69....)

@dbutenhof dbutenhof merged commit 2d0b189 into distributed-system-analysis:main Jul 26, 2023
3 checks passed
@dbutenhof dbutenhof deleted the agentca branch July 26, 2023 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants