Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: In cluster dialer to proxy TCP connections to unexposed services #688

Merged
merged 4 commits into from
Dec 6, 2021

Conversation

matejvasek
Copy link
Contributor

@matejvasek matejvasek commented Nov 30, 2021

Added code that will proxy TCP connections via in cluster pod.
This is useful for accessing k8s services that are not exposed.

The connections are created using standard Go dial function:
DialContext(ctx context.Context, network string, addr string) (net.Conn, error).

Mechanism:
Upon creation of dialer we create a pod with command sleep infinity.
Then each time DialContext function is called we execute socat - TCP:addr in the pod and use its stdio as source for our implementation of net.Conn.

Added code that will proxy TCP connections via in cluster pod.
This is useful for accessing k8s services that are not exposed.

Signed-off-by: Matej Vasek <mvasek@redhat.com>
@google-cla google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label Nov 30, 2021
@knative-prow-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: matejvasek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 30, 2021
@codecov
Copy link

codecov bot commented Nov 30, 2021

Codecov Report

Merging #688 (ed0794f) into main (ccf0015) will increase coverage by 1.47%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #688      +/-   ##
==========================================
+ Coverage   38.03%   39.51%   +1.47%     
==========================================
  Files          42       42              
  Lines        3917     3950      +33     
==========================================
+ Hits         1490     1561      +71     
+ Misses       2225     2172      -53     
- Partials      202      217      +15     
Impacted Files Coverage Δ
repositories.go 52.00% <0.00%> (-1.95%) ⬇️
templates.go 79.10% <0.00%> (+0.64%) ⬆️
client.go 65.08% <0.00%> (+3.58%) ⬆️
cmd/root.go 59.25% <0.00%> (+11.59%) ⬆️
cmd/build.go 56.71% <0.00%> (+29.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ccf0015...ed0794f. Read the comment docs.

Signed-off-by: Matej Vasek <mvasek@redhat.com>
Signed-off-by: Matej Vasek <mvasek@redhat.com>
defer cancel()
delOpts := metaV1.DeleteOptions{}

return c.coreV1.Pods(c.namespace).Delete(ctx, c.podName, delOpts)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhuss @markusthoemmes is there a way to make pod to be automatically delete upon completion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do this if you are using a higher-level abstraction like Deployment or ReplicaSet if you want them to manage the lifecycle of your pod, otherwise, you have to do it on your own.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, what do you mean with 'upon' completion ? When all containers in the pod stop ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when all processes exited.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the app for some reason crashes the Close() may not be called so there would be dangling completed pods.

@matejvasek
Copy link
Contributor Author

@rhuss @markusthoemmes When I exec command in a pod it there a way to send a signal to the created process?

@rhuss
Copy link
Contributor

rhuss commented Dec 1, 2021

@rhuss @markusthoemmes When I exec command in a pod it there a way to send a signal to the created process?

Maybe with a second exec sh -c "cat /tmp/pid.txt | xargs kill" (and writing the PID to a pid file in the first exec) ?

@markusthoemmes
Copy link
Contributor

At least for the kubectl run case, closing stdin was the signal required to shut the process (and the pod) down. Does that not work for exec? I'd think the socat command only lives as long as stdin lives.

@matejvasek
Copy link
Contributor Author

At least for the kubectl run case, closing stdin was the signal required to shut the process (and the pod) down. Does that not work for exec? I'd think the socat command only lives as long as stdin lives.

Yes, closing stdin should endsocat just fine. I was just thinking if it hanged for some reason if there is a forceful way. But it shouldn't be needed.

@matejvasek
Copy link
Contributor Author

@rhuss @markusthoemmes
I wonder would it be good/sane idea to modify global http.DefaultTransport to use in cluster dialer for hostnames ending with .svc/.cluster.local?

Something like:

func init() {
	if dt, ok := http.DefaultTransport.(*http.Transport); ok {
		dc := dt.DialContext
		newDC := mixedDialer{defaultDialContext: dc}
		dt.DialContext = newDC.DialContext
	}
}

type mixedDialer struct {
	o                     sync.Once
	inClusterDialer       *contextDialer
	inClusterDialerFailed bool
	defaultDialContext    func(ctx context.Context, network, addr string) (net.Conn, error)
}

func (m *mixedDialer) DialContext(ctx context.Context, network string, addr string) (net.Conn, error) {
	host, _, err := net.SplitHostPort(addr)
	if err != nil {
		return nil, err
	}
	if !m.inClusterDialerFailed && (strings.HasSuffix(host, ".svc") || strings.HasSuffix(host, ".cluster.local")) {
		m.o.Do(func() {
			m.inClusterDialer, err = NewInClusterDialer(context.Background())
			if err != nil {
				m.inClusterDialerFailed = true
			}
		})
		return m.inClusterDialer.DialContext(ctx, network, addr)
	}
	return m.defaultDialContext(ctx, network, addr)
}

func (m *mixedDialer) Close() error {
	return m.inClusterDialer.Close()
}

@markusthoemmes
Copy link
Contributor

markusthoemmes commented Dec 6, 2021

I've discussed a similar thing over at ko, but it completely breaks use cases where you're legitimately running your tool inside a cluster, where those hosts are reachable by default. I'd only do the shenanigans on "explicit" user request. I did contemplate doing some probing to find out if users might want this (i.e. call net.LookupName and fall back to auto-tnneling if the .svc based thingy doesn't resolve), but that might be a little too much magic.

.cluster.local also isn't a universal suffix.

@matejvasek
Copy link
Contributor Author

@knative-sandbox/kn-plugin-func-approvers is this good to go?

Copy link
Member

@lance lance left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matejvasek do you think the socat image would be better under the boson namespace at quay.io? I'm kind of concerned about it being under a personal account.

@matejvasek
Copy link
Contributor Author

@lance the image already exists in docker.io but I am mirroring it to avoid the pull limits. Also we might need to productize the image.

@lance lance changed the title In cluster dialer feat: In cluster dialer to proxy TCP connections to unexposed services Dec 6, 2021
Signed-off-by: Matej Vasek <mvasek@redhat.com>
@matejvasek
Copy link
Contributor Author

@lance I updated the image.

Copy link
Member

@lance lance left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 6, 2021
@knative-prow-robot knative-prow-robot merged commit 98ef5a0 into knative:main Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants