Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

TLS verification Helm operator #1484

Merged
merged 6 commits into from
Nov 1, 2018
Merged

Conversation

hiddeco
Copy link
Member

@hiddeco hiddeco commented Oct 31, 2018

First off: this bug wasn't completely our fault.

The Helm docs on this subject are giving you the idea that everything is configured as it should be but are actually ignoring the TLS CA verification.

Their example tells you to run:

$ helm ls --tls --tls-ca-cert ca.cert.pem --tls-cert helm.cert.pem --tls-key helm.key.pem

This configuration sends our client-side certificate to establish identity, uses the client key for encryption, and uses the CA certificate to validate the remote Tiller's identity.

The latter is not true, to actually validate the CA provided you have to add an additional configuration flag --tls-verify.

If you run the command with the --tls-verify flag you will get an error back:

# helm --tls --tls-verify --tls-ca-cert=/etc/fluxd/helm-ca/ca.crt --tls-cert=/etc/fluxd/helm/tls.crt --tls-key=/etc/fluxd/helm/tls.key ls
Error: x509: cannot validate certificate for 10.110.56.228 because it doesn't contain any IP SANs

This error brings you to the troubleshooting section on the same page:

If I use --tls-verify on the client, I get Error: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs

During the TLS handshake, a target, usually provided as a hostname (e.g. example.com), is checked against the subject and subject alternative names of the certificate (i.e. hostname verficiation). However, because of the tunnel, the target is an IP address. Therefore, to validate the certificate, the IP address 127.0.0.1 must be listed as an IP subject alternative name (IP SAN) in the Tiller certificate.

The solution would be to add the cluster IP of the tiller-deploy service to the Tiller certificate. This quite inconvenient for our users as they don't know the cluster IP beforehand and it's likely to change over time.

Alternative solution:

  • connections to Tiller are now made to the hostname instead of the cluster IP (<service>.<namespace>)
  • added notes to Helm operator docs about the required Common Name configuration for the Tiller cert
  • integrated --tls-hostname Helm client flag so user has full control over verification process

Bonus:

  • fixed the CA cert in the Helm template

Image available for testing: hiddeco/helm-operator:1465-tiller-tls


Fixes #1465 #1475

Hidde Beydals added 4 commits October 31, 2018 22:42
We do not connect to the FQDN but only to <service>.<namespace>
as the service DNS domain (.cluster.local) is configurable.
Gives the user control over the server name used to verify
the hostname on the returned certificates from Tiller.
@hiddeco hiddeco added the helm label Oct 31, 2018
@hiddeco hiddeco changed the title TLS vertification Helm operator TLS verification Helm operator Oct 31, 2018
Copy link
Member

@stefanprodan stefanprodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @hiddeco

Copy link
Contributor

@Smirl Smirl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting to this so fast.

It might be worth mentioning in the release notes that the helm client version has been increased from 2.8.1 to 2.11.0 in case people have issues.

@hiddeco
Copy link
Member Author

hiddeco commented Nov 1, 2018

@Smirl the client already was 2.11.0 for the Gopkg.toml, version = "v2.8.1" == >=2.8.1, only thing keeping it back from updating was the Gopkg.lock.

Go dep only pins to a version if you make it either version = "=v2.8.1" (specific version) or version = "~v2.8.1" (2.8.x).

Ref: https://golang.github.io/dep/docs/Gopkg.toml.html#version

Copy link
Member

@squaremo squaremo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job on puzzling this out! There's what looks like a typo; otherwise all looks good to me. 💎

chart/flux/templates/helm-operator-deployment.yaml Outdated Show resolved Hide resolved
site/helm-operator.md Show resolved Hide resolved
@hiddeco
Copy link
Member Author

hiddeco commented Nov 1, 2018

@squaremo @stefanprodan thanks a lot for the reviews guys! 🥇

Stefan; when drafting the new Helm operator release it may be wise to mention the version increase of the internal Helm client. During my testing I was able to talk to a lower Tiller version (client v2.11.0 talking to Tiller v2.10.0) but officially this is not supported. (Thanks @Smirl for bringing this to light 🌞).

We support older clients talking to newer versions of tiller, which is part of that backwards-compatible guarantees we make.
...
I'm in favour of closing this out and not allowing newer clients talking to older tiller instances given that we have not made any guarantees towards maintaining forwards-compatibility.

Ref: helm/helm#4549 (comment)

@hiddeco hiddeco merged commit bfa8405 into fluxcd:master Nov 1, 2018
@hiddeco hiddeco deleted the 1465-tiller-tls branch November 1, 2018 13:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

helm-op: TLS verify causes context deadline exceeded error
4 participants