Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer: add log for time periods of restored chunks which may be broken #4028

Merged

Conversation

daipom
Copy link
Contributor

@daipom daipom commented Jan 26, 2023

Which issue(s) this PR fixes:
Partial fix for #3970

What this PR does / why we need it:
Add logs of possible time periods of buffer corruption during buffer resumimg.
When one of the remaining chunk files is broken at the resuming, this logs created_at and modified_at of other files undetected.

Since a broken chunk file was found, it is possible that other files remaining at the time of resuming were also broken. Here is the list of the files.
  /path/buffer.bxxxx.log: created_at=2023-01-26 18:08:16 +0900 modified_at=2023-01-26 18:08:17 +0900
  /path/buffer.qxxxx.log: created_at=2023-01-26 18:08:16 +0900 modified_at=2023-01-26 18:08:17 +0900
  {...}

Fluentd can detect the corruption when the meta-data is broken, but can't when the chunk file is broken.
So if one corruption is detected, we should assume that other files may also be broken.
This allows us to know other possible time periods of corruption.

Docs Changes:

Release Note:
Add logs for time period of restored buffer possibly broken.

How to Reproduce

  • Prepare a config:
<source>
  @type sample
  tag test.a
</source>

<source>
  @type sample
  tag test.b
</source>

<source>
  @type sample
  tag test.c
</source>

<match test.**>
  @type file
  path /test/fluentd/log/${tag}/%Y%m%d/fluentd.log
  append true
  add_path_suffix false
  <buffer time,tag>
    @type file
    path /test/fluentd/buffer
    timekey 24h
    flush_mode interval
    flush_interval 10s
    overflow_action drop_oldest_chunk
  </buffer>
</match>
  • Start the fluentd, and you can get 3 chunk files.
  • Stop the fluentd so that the 3 files remain.
  • Break the one of meta data:
$ head -c 89 /dev/zero > buffer.b5f32716d72fc5d71592aa89c5865fd48.log.meta
  • Restart the fluentd, and you can see the following logs:
[info]: #0 fluent/log.rb:330:info: starting fluentd worker pid=920781 ppid=920761 worker=0
[debug]: #0 fluent/log.rb:309:debug: restoring buffer file: path = /test/fluentd/buffer/buffer.b5f32716d7292f8138b36fd759abf7207.log
[debug]: #0 fluent/log.rb:309:debug: restoring buffer file: path = /test/fluentd/buffer/buffer.b5f32716d72fc5d71592aa89c5865fd48.log
[error]: #0 fluent/log.rb:372:error: found broken chunk file during resume. Deleted corresponding files: path="/test/fluentd/buffer/buffer.b5f32716d72fc5d71592aa89c5865fd48.log" mode=:staged err_msg="staged meta file is broken. no implicit conversion of Symbol into Integer"
[debug]: #0 fluent/log.rb:309:debug: restoring buffer file: path = /test/fluentd/buffer/buffer.b5f32716d734618fef772d3ae48fd577a.log
[info]: #0 fluent/log.rb:330:info: Since a broken chunk file was found, it is possible that other files remaining at the time of resuming were also broken. Here is the list of the files.
[info]: #0 fluent/log.rb:330:info:   /test/fluentd/buffer/buffer.b5f32716d7292f8138b36fd759abf7207.log: created_at=2023-01-26 18:08:16 +0900 modified_at=2023-01-26 18:08:17 +0900
[info]: #0 fluent/log.rb:330:info:   /test/fluentd/buffer/buffer.b5f32716d734618fef772d3ae48fd577a.log: created_at=2023-01-26 18:08:16 +0900 modified_at=2023-01-26 18:08:17 +0900
[debug]: #0 fluent/log.rb:309:debug: buffer started instance=3000 stage_size=216 queue_size=0
[info]: #0 fluent/log.rb:330:info: fluentd worker is now running worker=0
  • This allows us to know other possible time periods of corruption.

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
Copy link
Member

@ashie ashie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ashie ashie merged commit 8c2e9dd into fluent:master Jan 27, 2023
@daipom daipom deleted the log-restored-buffer-time-period-possibly-broken branch January 27, 2023 10:06
@daipom
Copy link
Contributor Author

daipom commented Jan 27, 2023

Thanks for merging!

@ashie ashie added this to the v1.16.0 milestone Feb 9, 2023
daipom added a commit to daipom/fluentd-docs-gitbook that referenced this pull request Mar 29, 2023
* fluent/fluentd#4025
* fluent/fluentd#4028

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants