Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Kafka cluster connections #3397

Closed
pierDipi opened this issue Oct 16, 2023 · 6 comments · Fixed by #3493
Closed

Optimize Kafka cluster connections #3397

pierDipi opened this issue Oct 16, 2023 · 6 comments · Fixed by #3493

Comments

@pierDipi
Copy link
Member

pierDipi commented Oct 16, 2023

Problem

In our controllers, we often re-create Kafka cluster connections, the original reason was this issue IBM/sarama#2060 . Now, it's fixed we need to re-review our Kafka cluster interactions in the control plane (for all resources) and see if we can optimize them by reducing the number of TCP connections created.

Persona:
Which persona is this feature for?
*

Exit Criteria

Lower number of Kafka connections for a single resource reconciliation

Time Estimate (optional):
How many developer-days do you think this may take to resolve?
7

Additional context (optional)

/kind performance

@pierDipi
Copy link
Member Author

For monitoring and viewing tcp connections for manual testing we could try to use https://www.inspektor-gadget.io/docs/v0.20.0/gadgets/top/tcp/

@Cali0707
Copy link
Member

Cali0707 commented Nov 22, 2023

Hey @pierDipi for this issue, are you picturing something like a connection pool now that the connections can re-authenticate without being re-created?

edit: I realized it's more complicated than a simple pool as different connections may need different bootstrap servers and/or config options. I'm going to test:

  1. A simple set where whenever a new (bootstrap server, config) pair is used to request a client, a new connection is opened
  2. The above with LRU caching to keep max connections < some number

/assign

@pierDipi
Copy link
Member Author

Let's keep in mind that connection pools and caches are very prone to security issues, so let's have a design that reduces the risk of security issues without over-optimizing too early

@Cali0707
Copy link
Member

Cali0707 commented Nov 23, 2023

Let's keep in mind that connection pools and caches are very prone to security issues, so let's have a design that reduces the risk of security issues without over-optimizing too early

Do you have some other ideas? My initial really simple idea was to just create one admin client per Reconciler struct that needs one, but I wasn't sure if that would make too many connections overall, and also different brokers may have different bootstrap servers/security combinations which would mean the same client would not work so I thought of the pooling. WDYT @pierDipi ?

Copy link
Contributor

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 22, 2024
@Cali0707
Copy link
Member

Cali0707 commented Feb 22, 2024

/triage accepted
/remove-lifecycle stale

@knative-prow knative-prow bot added the triage/accepted Issues which should be fixed (post-triage) label Feb 22, 2024
@Cali0707 Cali0707 removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. triage/accepted Issues which should be fixed (post-triage) labels Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants