Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 1726: Fix for in_tcp log corruption under load. #1729

Merged
merged 1 commit into from
Nov 3, 2017

Conversation

AM-iain
Copy link
Contributor

@AM-iain AM-iain commented Oct 27, 2017

The TCP input plugin shares a single buffer across all connections. Under load
the buffer sometimes gets truncated. The suspicion is that concurrent
connections race to modify it.

Long message is received.
Short message is received while the long message is still being parsed.
The short message is parsed so the buffer will be truncated.
The buffer, which now contains "longmessage\nshort\n", is truncated by the
length of the short message. It is now "longm".
Another message arrives. Now the buffer is "longmanother\n", which does not
parse.
The whole buffer is thrown away and subsequent messages are received and
handled as usual.
Eventually the pattern repeats.

The fix is to use a per-connection buffer.

The TCP input plugin shares a single buffer across all connections.  Under load
the buffer sometimes gets truncated.  The suspicion is that concurrent
connections race to modify it.

Long message is received.
Short message is received while the long message is still being parsed.
The short message is parsed so the buffer will be truncated.
The buffer, which now contains "longmessage\nshort\n", is truncated by the
length of the short message.  It is now "longm".
Another message arrives.  Now the buffer is "longmanother\n", which does not
parse.
The whole buffer is thrown away and subsequent messages are received and
handled as usual.
Eventually the pattern repeats.

The fix is to use a per-connection buffer.
@repeatedly repeatedly self-assigned this Nov 1, 2017
@repeatedly repeatedly added bug Something isn't working v0.14 labels Nov 1, 2017
@repeatedly repeatedly merged commit 6745366 into fluent:master Nov 3, 2017
@repeatedly
Copy link
Member

Reporter confirmed this patch fixed the issue. Thanks!

@AM-iain AM-iain deleted the issue1726 branch November 3, 2017 09:48
@sampointer
Copy link

sampointer commented Nov 6, 2017

It would be a great help if this could be tagged as 0.14.23

@sampointer
Copy link

I am a colleague of @AM-iain. We've tested this on our infrastructure successfully. 0.14.23 would be a great help.

mururu pushed a commit to mururu/fluentd that referenced this pull request Dec 28, 2017
fluent#1729 introduced @buffer to
TCPCallbackSocket. TLSCallbackSocket has to have it to be used for
in_tcp.
mururu pushed a commit to mururu/fluentd that referenced this pull request Dec 28, 2017
fluent#1729 introduced @buffer to
TCPCallbackSocket. TLSCallbackSocket has to have it to be used for
in_tcp.
mururu pushed a commit to mururu/fluentd that referenced this pull request Dec 28, 2017
fluent#1729 introduced @buffer to
TCPCallbackSocket. TLSCallbackSocket also has to have it to be used for
in_tcp.
mururu pushed a commit to mururu/fluentd that referenced this pull request Dec 28, 2017
fluent#1729 introduced @buffer to
TCPCallbackSocket. TLSCallbackSocket also has to have it to be used for
in_tcp.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working v0.14
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants