Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] exporter causes too many TIME_WAIT with a scrape interval of 5s #12

Closed
imryao opened this issue Dec 19, 2021 · 1 comment · Fixed by #13
Closed

[Bug] exporter causes too many TIME_WAIT with a scrape interval of 5s #12

imryao opened this issue Dec 19, 2021 · 1 comment · Fixed by #13

Comments

@imryao
Copy link
Contributor

imryao commented Dec 19, 2021

Hi @wi1dcard ! I really like this project, but I've found something unexpected recently.
I'm using Prometheus with a scrape_interval of 5s. When I check the logs of v2ray, I found many logs like this:

2021/12/19 11:26:03 [Info] [109249963] proxy/dokodemo: received request for 172.31.0.4:49388
2021/12/19 11:26:03 [Info] [109249963] app/dispatcher: taking detour [api] for [tcp:127.0.0.1:0]
2021/12/19 11:26:03 [Info] [109249963] app/proxyman/inbound: connection ends > proxy/dokodemo: connection ends > proxy/dokodemo: failed to transport request > read tcp 172.31.0.3:8888->172.31.0.4:49388: read: connection reset by peer
2021/12/19 11:26:08 [Info] [1714658193] proxy/dokodemo: received request for 172.31.0.4:49390
2021/12/19 11:26:08 [Info] [1714658193] app/dispatcher: taking detour [api] for [tcp:127.0.0.1:0]
2021/12/19 11:26:08 [Info] [1714658193] app/proxyman/inbound: connection ends > proxy/dokodemo: connection ends > proxy/dokodemo: failed to transport request > read tcp 172.31.0.3:8888->172.31.0.4:49390: read: connection reset by peer
2021/12/19 11:26:13 [Info] [3724667099] proxy/dokodemo: received request for 172.31.0.4:49392
2021/12/19 11:26:13 [Info] [3724667099] app/dispatcher: taking detour [api] for [tcp:127.0.0.1:0]
2021/12/19 11:26:13 [Info] [3724667099] app/proxyman/inbound: connection ends > proxy/dokodemo: connection ends > proxy/dokodemo: failed to transport request > read tcp 172.31.0.3:8888->172.31.0.4:49392: read: connection reset by peer

while 172.31.0.3 refers to v2ray, 172.31.0.4 refers to exporter.

Then I check netstat of exporter:

Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 172.31.0.4:49562        172.31.0.3:8888         TIME_WAIT  
tcp        0      0 172.31.0.4:49582        172.31.0.3:8888         TIME_WAIT  
tcp        0      0 172.31.0.4:49584        172.31.0.3:8888         TIME_WAIT  
tcp        0      0 172.31.0.4:49570        172.31.0.3:8888         TIME_WAIT  
tcp        0      0 172.31.0.4:49566        172.31.0.3:8888         TIME_WAIT  
tcp        0      0 172.31.0.4:49578        172.31.0.3:8888         TIME_WAIT  
tcp        0      0 172.31.0.4:49574        172.31.0.3:8888         TIME_WAIT  
tcp        0      0 172.31.0.4:49564        172.31.0.3:8888         TIME_WAIT  
tcp        0      0 172.31.0.4:49580        172.31.0.3:8888         TIME_WAIT  
tcp6       0      0 172.31.0.4:9550         172.31.0.2:51808        ESTABLISHED
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags       Type       State         I-Node   Path

while 172.31.0.2 refers to reverse proxy.

As you can see, there are about 10 connections in TIME_WAIT.

Then I check the source code, I found the exporter will create a new connection every time it receives a scrape request:

func (e *Exporter) scrapeV2Ray(ch chan<- prometheus.Metric) error {
	ctx, cancel := context.WithTimeout(context.Background(), e.scrapeTimeout)
	defer cancel()

	conn, err := grpc.DialContext(ctx, e.endpoint, grpc.WithInsecure(), grpc.WithBlock())
	if err != nil {
		return fmt.Errorf("failed to dial: %w, timeout: %v", err, e.scrapeTimeout)
	}
	defer conn.Close()

	client := command.NewStatsServiceClient(conn)

	if err := e.scrapeV2RaySysMetrics(ctx, ch, client); err != nil {
		return err
	}

	if err := e.scrapeV2RayMetrics(ctx, ch, client); err != nil {
		return err
	}

	return nil
}

I think this can lead to TIME_WAIT mentioned above. I wonder if we can reuse the grpc connection to reduce the number of client creating.

Looking forward to your reply!

@imryao
Copy link
Contributor Author

imryao commented Dec 19, 2021

Hi @wi1dcard ! I've made some bugfix here.
#13
Looking forward to your comments!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant