Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent thread switching in the interval between seek and write operations to pos_file #2118

Merged
merged 2 commits into from
Sep 18, 2018
Merged

Prevent thread switching in the interval between seek and write operations to pos_file #2118

merged 2 commits into from
Sep 18, 2018

Conversation

vitclone
Copy link
Contributor

We're trying to migrate to Fluentd for transferring web-server logs to HDFS and I can't understand why we are the first who face this issue.

After deploying to one server with production load (read_from_head true) we restarted td-agent and after a while we found duplicated log records in HDFS storage. A quick examination led us to a strange inconsistency in pos_file like this line:
/data/htlogs/00000000b2ebab5f/data/htlogs/nginx/w11009.log 00000000029ba830 000000000000601a9/data00000000000332a15875.log 0000000000000000 0000000000421e96

I added debug log and found the mixing of seek and write operations from different threads. The first thread was updating log files current position (set file.pos, write) and the second thread was adding new records to pos_file (set file.pos, write, write, read file.pos, write, read file.pos).

Also, I added a warning for unparsable lines in pos_file.

…ad/write operations to pos_file

Signed-off-by: Alexey Schurov <aa.schurov@gmail.com>
Signed-off-by: Alexey Schurov <aa.schurov@gmail.com>
@repeatedly
Copy link
Member

Thanks for the patch.

I added debug log and found the mixing of seek and write operations from different threads.

Does this mean one in_tail causes broken pos_file?

@repeatedly repeatedly self-assigned this Sep 4, 2018
@vitclone
Copy link
Contributor Author

vitclone commented Sep 4, 2018

Yes, we have only one in_tail in config and this mixing of operation is the cause of inconsistencies. On our server, I found several different situations, for example:

  • writing a new path while a file offset set by update_pos,
  • remembering seek from an offset set by update_pos,
  • remembering last_pos from an offset set by update_pos.

@repeatedly repeatedly merged commit 19b6cd6 into fluent:master Sep 18, 2018
@repeatedly
Copy link
Member

Sorry for the delay. Just merged!

@johanneswuerbach
Copy link

@repeatedly any chance to get a new version including this? Thanks :-)

@repeatedly
Copy link
Member

@johanneswuerbach sorry for late reply. v1.2.6 is released in last week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants