buffer_chunk_limit exceeded when using detach_process #930

cbroglie · 2016-05-03T22:30:24Z

Using a file buffer output plugin with detach_process results in chunk sizes >> buffer_chunk_limit when sending events at high speed. My settings specify 1MB chunk sizes, but it is easy to generate chunk sizes >50MB by writing 500k record (~200 bytes per record) in 4 seconds. My testing was done using a c4.2xl instance running Amazon Linux 2016.03 with td-agent2.

Config

<match kinesis.**>
  @type kinesis_producer
  log_level trace
  region us-west-2
  stream_name cbroglie-fluentd-test
  aws_key_id XXX
  aws_sec_key XXX
  buffer_chunk_limit 1m
  buffer_queue_limit 100
  flush_interval 1
  try_flush_interval 0.1
  queued_chunk_flush_interval 0.01
  num_threads 10
  detach_process 1
  buffer_type file
  buffer_path /var/log/td-agent/ztrack.*.buffer
  <secondary>
    @type file
    path /var/log/td-agent/ztrack.failed
  </secondary>
</match>

Benchmark driver

$ time ./fluent-bench -count 500000
2016/05/03 22:19:31 processed=500000 p50=14.89µs p99=31.768µs
2016/05/03 22:19:31 succeeded=500000
2016/05/03 22:19:31 failed=0

real    0m4.322s
user    0m11.632s
sys 0m3.628s
$

Resulting buffer files

$ ls -lh /var/log/td-agent/
total 90M
-rw-r--r-- 1 td-agent td-agent 4.3K May  3 22:19 td-agent.log
-rw-r--r-- 1 td-agent td-agent  54M May  3 22:19 ztrack..b531f781175491342.buffer
-rw-r--r-- 1 td-agent td-agent 5.9M May  3 22:19 ztrack..q531f78093ddcb2ad.buffer
-rw-r--r-- 1 td-agent td-agent  15M May  3 22:19 ztrack..q531f780a3027f5f1.buffer
-rw-r--r-- 1 td-agent td-agent  16M May  3 22:19 ztrack..q531f780d3171f187.buffer
$

Full background

I'm trying to use fluentd with the kinesis output plugin, and am currently trying to benchmark what throughput we can achieve. The output plugin uses the Amazon Kinesis Producer Library, which is a native C++ binary. I benchmarked the KPL native process at being able to sustain ~60k RPS (~10MB/s), and thus planned on using detach_process to launch multiple KPL processes to support higher throughput through fluentd.

The Kinesis output plugin works by sending the entire chunk to the KPL, and then waiting for a response for each record in the chunk before returning from write. It's critical that the chunk sizes don't exceed the limit, since sending massive batches to the KPL at once will cause records to timeout in the KPL while waiting to be sent, eventually resulting in the entire batch being retried.

The text was updated successfully, but these errors were encountered:

repeatedly · 2016-05-03T22:34:52Z

Is this a fluent-plugin-kinesis issue or fluentd core issue?
If former, don't use this repository for such 3rd party plugin issue.

cbroglie · 2016-05-03T22:38:38Z

The Kinesis plugin extends BufferedOutput, and the file buffering is done by the core BufferedOutput class. That's why I reported the issue here - please let me know if my understanding is incorrect and this is actually an issue specific to the Kinesis plugin.

repeatedly · 2016-05-03T22:41:28Z

Hm... so from your test, buffer_chunk_limit doesn't exceed 1+MB without detach_process, right?

cbroglie · 2016-05-03T22:41:53Z

That's correct, it only happens when using detach_process

repeatedly · 2016-05-03T22:42:13Z

Okay. That's clear for me.

repeatedly · 2016-05-03T22:51:55Z

@cbroglie Where is the fluent-bench code? Is this a your test program?

cbroglie · 2016-05-03T23:01:14Z

Yes that's my test program, I can create a gist.

repeatedly · 2016-05-03T23:03:33Z

Thx! I will try to fix the problem with your program 👍

cbroglie · 2016-05-03T23:23:54Z

Pushed a slightly redacted version to https://github.com/cbroglie/fluent-bench. It requires Go 1.6 to build.

cbroglie · 2016-05-15T00:37:33Z

@repeatedly I spent a little time looking into this, and the difference when running with detach_process is how BasicBuffer#emit is invoked:

When detach_process is not used, BasicBuffer#emit is invoked once per record sent to fluentd
When detach_process is used, BasicBuffer#emit is invoked less frequently and with large batches (sometimes larger than the buffer_chunk_limit setting, which triggers these warnings in the logs)

I tracked this further to the forwarder logic in process.rb:

def new_forwarder(w, interval)
  if interval < 0.2  # TODO interval
    Forwarder.new(w)
  else
    DelayedForwarder.new(w, interval)
  end
end

new_forwarder is always invoked with the hardcoded value of 0.5, and this interval is what ultimately controls the size of the batch passed to BasicBuffer#emit, without any consideration of buffer_chunk_limit.

I tested modifying new_forwarder to always create a Forwarder instance instead of a DelayedForwarder instance, and it fixes the issue of creating buffer files larger than the chunk size (since BasicBuffer#emit is now called once per record, the same behavior as when running without detach_process). I have not yet measured the affect on performance.

cbroglie · 2016-05-15T01:55:48Z

Seems to be performant enough, I was able to sustain 16k RPS after making the change with the following configuration:

<match kinesis.**>
  @type kinesis_producer
  log_level info
  region us-west-2
  stream_name cbroglie-fluentd-test-10
  aws_key_id xxx
  aws_sec_key xxx
  buffer_chunk_limit 500k
  buffer_queue_limit 40000
  flush_interval 1
  try_flush_interval 0.1
  queued_chunk_flush_interval 0.01
  num_threads 25
  buffer_type file
  buffer_path /var/log/td-agent/ztrack*.buffer
  detach_process 3
  disable_retry_limit true
</match>

And I can sustain higher rates by increasing the value of detach_process.

repeatedly · 2016-05-16T21:20:20Z

And I can sustain higher rates by increasing the value of detach_process.

Good.

process.rb is implemented for improving plugin performance but
existing plugins focus to hide flush latency with num_threads.
This is why detach_process is not popular and several TODOs are still remained...

We can add interval option to control forwarding interval but
we are now focusing v0.14 development with multi-process support.
So if this is not urgent, we want to postpone the process.rb improvement.

cbroglie · 2016-05-17T15:33:49Z

Yeah, the reason I need detach_process is that the Kinesis output plugin is currently CPU bound, and therefore increasing num_threads doesn't help. I will see if I can optimize that plugin, else submit a PR to make the interval configurable.

cbroglie · 2016-05-23T23:09:39Z

@repeatedly I don't have any good options for fixing the CPU bottleneck in the Kinesis output plugin at the moment, so I opened a PR (#982) which makes the interval in process.rb configurable. It's not a complete solution, but it's enough for me to work around the problem for now.

repeatedly self-assigned this May 3, 2016

repeatedly added the bug Something isn't working label May 10, 2016

cbroglie mentioned this issue May 23, 2016

Make the detach process forward interval configurable #982

Merged

cbroglie closed this as completed Jun 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

buffer_chunk_limit exceeded when using detach_process #930

buffer_chunk_limit exceeded when using detach_process #930

cbroglie commented May 3, 2016

repeatedly commented May 3, 2016

cbroglie commented May 3, 2016

repeatedly commented May 3, 2016

cbroglie commented May 3, 2016

repeatedly commented May 3, 2016

repeatedly commented May 3, 2016 •

edited

Loading

cbroglie commented May 3, 2016

repeatedly commented May 3, 2016

cbroglie commented May 3, 2016

cbroglie commented May 15, 2016 •

edited

Loading

cbroglie commented May 15, 2016 •

edited

Loading

repeatedly commented May 16, 2016 •

edited

Loading

cbroglie commented May 17, 2016

cbroglie commented May 23, 2016 •

edited

Loading

buffer_chunk_limit exceeded when using detach_process #930

buffer_chunk_limit exceeded when using detach_process #930

Comments

cbroglie commented May 3, 2016

Config

Benchmark driver

Resulting buffer files

Full background

repeatedly commented May 3, 2016

cbroglie commented May 3, 2016

repeatedly commented May 3, 2016

cbroglie commented May 3, 2016

repeatedly commented May 3, 2016

repeatedly commented May 3, 2016 • edited Loading

cbroglie commented May 3, 2016

repeatedly commented May 3, 2016

cbroglie commented May 3, 2016

cbroglie commented May 15, 2016 • edited Loading

cbroglie commented May 15, 2016 • edited Loading

repeatedly commented May 16, 2016 • edited Loading

cbroglie commented May 17, 2016

cbroglie commented May 23, 2016 • edited Loading

repeatedly commented May 3, 2016 •

edited

Loading

cbroglie commented May 15, 2016 •

edited

Loading

cbroglie commented May 15, 2016 •

edited

Loading

repeatedly commented May 16, 2016 •

edited

Loading

cbroglie commented May 23, 2016 •

edited

Loading