Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer_chunk_limit exceeded when using detach_process #930

Closed
cbroglie opened this issue May 3, 2016 · 14 comments
Closed

buffer_chunk_limit exceeded when using detach_process #930

cbroglie opened this issue May 3, 2016 · 14 comments
Assignees
Labels
bug Something isn't working

Comments

@cbroglie
Copy link

cbroglie commented May 3, 2016

Using a file buffer output plugin with detach_process results in chunk sizes >> buffer_chunk_limit when sending events at high speed. My settings specify 1MB chunk sizes, but it is easy to generate chunk sizes >50MB by writing 500k record (~200 bytes per record) in 4 seconds. My testing was done using a c4.2xl instance running Amazon Linux 2016.03 with td-agent2.

Config

<match kinesis.**>
  @type kinesis_producer
  log_level trace
  region us-west-2
  stream_name cbroglie-fluentd-test
  aws_key_id XXX
  aws_sec_key XXX
  buffer_chunk_limit 1m
  buffer_queue_limit 100
  flush_interval 1
  try_flush_interval 0.1
  queued_chunk_flush_interval 0.01
  num_threads 10
  detach_process 1
  buffer_type file
  buffer_path /var/log/td-agent/ztrack.*.buffer
  <secondary>
    @type file
    path /var/log/td-agent/ztrack.failed
  </secondary>
</match>

Benchmark driver

$ time ./fluent-bench -count 500000
2016/05/03 22:19:31 processed=500000 p50=14.89µs p99=31.768µs
2016/05/03 22:19:31 succeeded=500000
2016/05/03 22:19:31 failed=0

real    0m4.322s
user    0m11.632s
sys 0m3.628s
$

Resulting buffer files

$ ls -lh /var/log/td-agent/
total 90M
-rw-r--r-- 1 td-agent td-agent 4.3K May  3 22:19 td-agent.log
-rw-r--r-- 1 td-agent td-agent  54M May  3 22:19 ztrack..b531f781175491342.buffer
-rw-r--r-- 1 td-agent td-agent 5.9M May  3 22:19 ztrack..q531f78093ddcb2ad.buffer
-rw-r--r-- 1 td-agent td-agent  15M May  3 22:19 ztrack..q531f780a3027f5f1.buffer
-rw-r--r-- 1 td-agent td-agent  16M May  3 22:19 ztrack..q531f780d3171f187.buffer
$

Full background

I'm trying to use fluentd with the kinesis output plugin, and am currently trying to benchmark what throughput we can achieve. The output plugin uses the Amazon Kinesis Producer Library, which is a native C++ binary. I benchmarked the KPL native process at being able to sustain ~60k RPS (~10MB/s), and thus planned on using detach_process to launch multiple KPL processes to support higher throughput through fluentd.

The Kinesis output plugin works by sending the entire chunk to the KPL, and then waiting for a response for each record in the chunk before returning from write. It's critical that the chunk sizes don't exceed the limit, since sending massive batches to the KPL at once will cause records to timeout in the KPL while waiting to be sent, eventually resulting in the entire batch being retried.

@repeatedly
Copy link
Member

Is this a fluent-plugin-kinesis issue or fluentd core issue?
If former, don't use this repository for such 3rd party plugin issue.

@cbroglie
Copy link
Author

cbroglie commented May 3, 2016

The Kinesis plugin extends BufferedOutput, and the file buffering is done by the core BufferedOutput class. That's why I reported the issue here - please let me know if my understanding is incorrect and this is actually an issue specific to the Kinesis plugin.

@repeatedly
Copy link
Member

Hm... so from your test, buffer_chunk_limit doesn't exceed 1+MB without detach_process, right?

@cbroglie
Copy link
Author

cbroglie commented May 3, 2016

That's correct, it only happens when using detach_process

@repeatedly
Copy link
Member

Okay. That's clear for me.

@repeatedly
Copy link
Member

repeatedly commented May 3, 2016

@cbroglie Where is the fluent-bench code? Is this a your test program?

@repeatedly repeatedly self-assigned this May 3, 2016
@cbroglie
Copy link
Author

cbroglie commented May 3, 2016

Yes that's my test program, I can create a gist.

@repeatedly
Copy link
Member

Thx! I will try to fix the problem with your program 👍

@cbroglie
Copy link
Author

cbroglie commented May 3, 2016

Pushed a slightly redacted version to https://github.com/cbroglie/fluent-bench. It requires Go 1.6 to build.

@repeatedly repeatedly added the bug Something isn't working label May 10, 2016
@cbroglie
Copy link
Author

cbroglie commented May 15, 2016

@repeatedly I spent a little time looking into this, and the difference when running with detach_process is how BasicBuffer#emit is invoked:

  • When detach_process is not used, BasicBuffer#emit is invoked once per record sent to fluentd
  • When detach_process is used, BasicBuffer#emit is invoked less frequently and with large batches (sometimes larger than the buffer_chunk_limit setting, which triggers these warnings in the logs)

I tracked this further to the forwarder logic in process.rb:

def new_forwarder(w, interval)
  if interval < 0.2  # TODO interval
    Forwarder.new(w)
  else
    DelayedForwarder.new(w, interval)
  end
end

new_forwarder is always invoked with the hardcoded value of 0.5, and this interval is what ultimately controls the size of the batch passed to BasicBuffer#emit, without any consideration of buffer_chunk_limit.

I tested modifying new_forwarder to always create a Forwarder instance instead of a DelayedForwarder instance, and it fixes the issue of creating buffer files larger than the chunk size (since BasicBuffer#emit is now called once per record, the same behavior as when running without detach_process). I have not yet measured the affect on performance.

@cbroglie
Copy link
Author

cbroglie commented May 15, 2016

Seems to be performant enough, I was able to sustain 16k RPS after making the change with the following configuration:

<match kinesis.**>
  @type kinesis_producer
  log_level info
  region us-west-2
  stream_name cbroglie-fluentd-test-10
  aws_key_id xxx
  aws_sec_key xxx
  buffer_chunk_limit 500k
  buffer_queue_limit 40000
  flush_interval 1
  try_flush_interval 0.1
  queued_chunk_flush_interval 0.01
  num_threads 25
  buffer_type file
  buffer_path /var/log/td-agent/ztrack*.buffer
  detach_process 3
  disable_retry_limit true
</match>

And I can sustain higher rates by increasing the value of detach_process.

@repeatedly
Copy link
Member

repeatedly commented May 16, 2016

And I can sustain higher rates by increasing the value of detach_process.

Good.

process.rb is implemented for improving plugin performance but
existing plugins focus to hide flush latency with num_threads.
This is why detach_process is not popular and several TODOs are still remained...

We can add interval option to control forwarding interval but
we are now focusing v0.14 development with multi-process support.
So if this is not urgent, we want to postpone the process.rb improvement.

@cbroglie
Copy link
Author

Yeah, the reason I need detach_process is that the Kinesis output plugin is currently CPU bound, and therefore increasing num_threads doesn't help. I will see if I can optimize that plugin, else submit a PR to make the interval configurable.

@cbroglie
Copy link
Author

cbroglie commented May 23, 2016

@repeatedly I don't have any good options for fixing the CPU bottleneck in the Kinesis output plugin at the moment, so I opened a PR (#982) which makes the interval in process.rb configurable. It's not a complete solution, but it's enough for me to work around the problem for now.

@cbroglie cbroglie closed this as completed Jun 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants