-
-
Notifications
You must be signed in to change notification settings - Fork 110
Crashes and excessive memory + CPU consumption when dealing with corrupted databases #217
Comments
A couple hours of fuzzing the msan-instrumented version, which runs about an order of magnitude slower than the asan-instrumented version, produced 42 "unique" crashes which all reduce to the same location as most other testcases which also trip the asan-instrumented version:
|
Triaging the testcases flagged as hangs by afl-fuzz (significant CPU time outliers) shows runaway CPU and memory consumption.
The 20+ TB total-vm figures are simply (though partially) a consequence of the way AddressSanitizer works, but the anon-rss figures are caused by mdbx_chk_asan_hangs_20210707.tar.gz - these files can perform partial DoS on most computers, you have been warned ;) |
Thank for your attention and reporting. By the first part of the traces I found that a some check was missed in at least one place, which is why a damaged meta-page is not skipped and as a result subsequent fails occur. Hope tomorrow during the day I will provide a fix. |
I've taken the current contents of the devel branch, fb78c5f , for another short test drive, restarting from the previous fuzzing output directory - which means that my input corpus still sucks. I only fuzzed the ASAN-instrumented code. Note that some of these samples trigger the production of 100+ lines of error output by |
Once again - Thank for your attention and reporting. Until now, I have not finished the work yet, so there was no new information here. |
@debrouxl, Briefly, there were the four problems/drawbacks/bugs:
|
You're right, full iteration of records makes less sense when the page tree is broken. Although high overhead would be tolerable for default-disabled checks, I tend to agree that attempting to make full traversal of a broken database memory-safe is lower priority :) Earlier this morning, I rebuilt the code and started another instance with HEAD @ 4fc6d67 . The crash rate is still a bit over 1 per 2K execs, with two different stack traces so far, both use after poison. I'll let the fuzzer run for a longer time before posting samples. |
On the newest mdbx_chk fuzzing job, barely more than three quarters of the paths are considered examined, but it's been over 3h since the latest so-called 'unique' crash was found, so it's time to post something :) I have packed up all of the files which used to crash, or still crash, mdbx_chk @ 4fc6d67 : mdbx_chk_asan_crashes_20210714.tar.gz
|
No additional stack trace found since my previous message. |
Once again - Thank a lot for your attention and reporting. |
And thanks to you for providing a set of fixes which raise the TTFC above ~1h30 for ASAN-instrumented I threw an |
@debrouxl , thank you for your work. However, I think that you should not use anything other than Nonetheless, a full check of the database pages used can be added as an option, i.e. by an explicit user request. |
You're welcome :) Well, the help text for While working on corrupted databases is indeed not the main intended use case for I've killed the |
[Post significantly edited ~13h later.] |
Today's status update:
Nevertheless, here's an updated tarball, again a superset of the previous one: mdbx_chk_asan_crashes_20210718.tar.gz . FTR, the |
More than 400M execs, and more than 4000 crashes. No new source location for crashes :) |
You're welcome :) I can confirm that all of the problems noticed (on |
I've now spent a bit of time fuzzing libmdbx, like I fuzzed Berkeley DB, LMDB, GDBM, TDB and other databases inspired by BDB and/or DBM in the past. Sorry, I didn't find about libmdbx until fairly recently...
In the README and Makefile, I read that you worked on fixing some crashes, and that you have asan & ubsan test targets - so clearly, you paid at least some level of attention to memory safety and UB, which is a good thing.
However, with a Time To First Crash around 2 minutes, '2021 libmdbx tolerates corrupted databases better than '2018 and '2021 LMDB do (TTFC << 1s on
mdb_dump
), and marginally better than Berkeley DB 18.1.40 (yes, the latest version, despite dozens of fixes for CVE-numbered issues over the years... I basically gave up reporting issues) does, but libmdbx is not quite fool-proof yet :)Building and starting a first, simple fuzzing job is straightforward, along the lines of:
(the AFL++ setup, which basically reduces to
git clone
andmake
when the build dependencies are installed, is not described here, for brevity)I stopped the mdbx_chk fuzzing process a bit after reaching 1M execs. Triaging the crashes already showed 5 unique code locations and SIGBUS, SIGSEGV, weirdness when unpoisoning memory, use after poison through wild pointers: that's enough to warrant creating this issue and provide the information which can enable you to perform your own fuzzing jobs.
The final
afl-fuzz
output was:The crash triage output is part of the tarball.
NOTE: in order to reproduce crashes, the best practice is to start from fresh copies of the files. The output of AddressSanitizer killing
mdbx_chk
on the attached files seems to be stable from one run to the next one (apart from randomized addresses, of course), but for instance, starting from fresh files is definitely necessary for reproducing a subset of the endless stream of crashes in Berkeley DB.Ideas for improving the next stages of the fuzzing process:
Looking forward to the fixes which will make libmdbx even more production ready ;)
mdbx_chk_asan_crashes_20210707.tar.gz
The text was updated successfully, but these errors were encountered: