-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce file lookups before purging and fix double referencing files #250
Conversation
Signed-off-by: tabokie <xy.tao@outlook.com>
@yiwu-arbug @Connor1996 Since there's no feedback from upstream, shall we go ahead and do the review ourselves? You can comment on the upstream PR, I will synchronize them manually. |
yes. sad to see no response in upstream. I will take another look later the week. |
Agree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Update HISTORY.md?
- Update PR summary with more details?
Commented on upstream PR. |
Signed-off-by: tabokie <xy.tao@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I assume it is the same as upstream PR.
Signed-off-by: tabokie <xy.tao@outlook.com>
Signed-off-by: Xinye Tao <xy.tao@outlook.com>
a584ade
to
2723739
Compare
Signed-off-by: Xinye Tao <xy.tao@outlook.com> Co-authored-by: Connor <zbk602423539@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just minor comments, LGTM
Signed-off-by: tabokie <xy.tao@outlook.com>
Signed-off-by: tabokie xy.tao@outlook.com
This PR contains a optimization to reduce DB mutex held time when generating obsolete files purging jobs:
Obsolete file candidates consist of
obsolete_files_
that are no longer referenced by any version inVersionSet
, or all physical files in DB directory ifdoing_full_scan
is true. Consider the former case,obsolete_files_
is guaranteed to be obsolete by protocol and thus no need to be checked against live files list. This PR avoids the unnecessary live file lookups when not doing full scan.The optimization assumes a solid reference counting for sst files, i.e. each physical SST file is represented by an unique
FileMetaData
structure. But this assumption is actually broken in existing codebase: A trivially moved file will appear as a new file meta with the same file number. This patch also fix this issue by forwarding moved old files inVersionBuilder
.Aside from that, this PR also assumes SST file won't be moved to a different path with the same file number, and will raise error in that case.
Finally, this PR shows impressive performance improvement in our own testing (thanks @5kbpers). We examine the patch in an artificial workload, where iterators are frequently created and released (active iterator count around 250, RocksDB write OPS around 20K). The logging shows before patching,
AddLiveFiles()
call takes 100+ms and in that process copies 2500K file metas with 60K of them being unique.(left: patched, right: vanilla)
Test Plan:
Benchmark Results:
[TPC-C 5K WH 1 KV 512 threads]
Before: TPM: 23676.8, 50th(ms): 570.4, 90th(ms): 704.6, 95th(ms): 738.2, 99th(ms): 872.4, 99.9th(ms): 5637.1, Max(ms): 12884.9
After: TPM: 24395.1(3%+), 50th(ms): 570.4, 90th(ms): 704.6, 95th(ms): 738.2, 99th(ms): 838.9, 99.9th(ms): 1342.2(76%-), Max(ms): 2952.8(77%-)