Replace dlmalloc with bitmap+buddy allocators #131

palainp · 2024-01-09T12:36:49Z

Hi devs!
The current memory allocation implementation is dlmalloc which provides great performances [1] but with a big amount of code (~6k LoC) and has recently some issues with memory used computation (the mallinfo provided is clean but need to walk along the whole heap and a fast pre-computed value was wrongly computed due to a bug in my code :( ). My observation is that the complexity of the code makes it hard to get into.

The Ocaml-5 support PR shows that Ocaml 5 now needs mmap/munmap which is not provided by dlmalloc, and this needs some more work.
Therefore I resued the bitmap allocator in mirage-xen (for grant pages) to be able to provide page aligned memory areas (this is useful for mmap and posix_mem_align). At the same time I added a binary buddy allocator for small memory requests (32< x <4096), large requests (>=4096) fall back to posix_mem_align. The code is now shorter (~700 LoC) and it should be simpler to understand (I hope).

Naturally I'm unable to provide a deep survey with that allocator :(
I run some basic tests and it works with qubes-mirage-firewall with similar performance, I planed to run long run tests to find some more issues.
The bitmap allocator is similar to a first-fit allocator and will probably lead to fragmentated memory which is not great, it may be possible to find a smarter allocator using linked lists for example at the cost of more code complexity. Time will tell.

[1]: as example a performance survey https://www.researchgate.net/publication/314689739_Experimental_Evaluation_and_Comparison_of_Memory_Allocators_in_the_GNULinux_Operating_System

dinosaure · 2024-01-15T11:20:32Z

The code is well documented indeed, but, as you said, we don't know yet the implication on our different applications of such new allocator. It can be interesting to test it with a widely used application such as tlstunnel for robur‧coop - however, we still experiment on our new utcp stack.

To clarify the goal of this PR, it's to help our upgrade to OCaml 5, I'm right?

One possibility (as we did for Mirage) is to maintain 2 ocaml-solo5, one that remains on OCaml 4.14 and one that is available for OCaml 5. The work for the latter is becoming more and more substantial and harder to maintain too. I don't know how to manage this in terms of versioning though (/cc @mirage/core).

palainp · 2024-01-15T12:40:38Z

Thank you for your feedback!

I can explain a bit about my motivations here: I started writing this patch for the 500-cleaned PR because Ocaml 5+ needs mmap/munmap. As there are a lot of changes with getting Ocaml5 runtime up, I wanted to split the changes and tried against the current ocaml-solo5 repository targeting Ocaml 4.14.
In my mind, fewer lines of code might be better, and this implementation, even a little naive around the edges, seems at first glance to have sufficient performance. Indeed, this assertion requires more testing in different contexts and I'll be happy to help, when your tests with utcp will be done and if you want to try this out :)

So far I tested with qubes-mirage-firewall and dns-resolver (this one is running on a public network, so it's been receiving regular connections, without any issues for a bit less than 1 week).

About this PR, I think it could be an intermediate step with Ocaml4 before moving to Ocaml5. And I agree with you, it'll be a pain to maintain two ocaml-solo5 but I failed to figure how to update configure/Makefile to be able to target both compilers, so I currently don't see how to deal with that :(

palainp · 2024-02-06T17:24:39Z

I didn't observed yet issues, I guess it's ready for review (maybe also related to a potential issue in mirage/ocaml-git#631 and the difficulties around bug tracking).

palainp · 2024-02-07T07:18:21Z

And FWIW a couple of notes to have in mind when reviewing:

the current overhead (to store metadata) is hardcoded at nolibc/mm.c:515 to 1MB, it probably can be reduced or calculated dynamically with the available memory, as well for the reserved stack size
there is an assertion to avoid metadata corruption at nolibc/mm.c:634 and nolibc/mm.c:679. malloc shouldn't crash the application but return NULL, but my position was that if we don't have enough room for keeping information for one more page allocation we're in trouble and it's better to crash...
for "large" requests (at least 4kB), the algorithm complexity is linear with the number of already allocated pages and is "first fit" (might leads to memory fragmentation), for "small" requests (less than 4kB), the algorithm is linear with the number of pages reserved for small requests in order to find a bin where the request can fit (if not found it'll reserve another page, see before :) ), then loglinear for finding the room for the request
I still not implemented something like the FOOTER in dlmalloc.i so memory pages and memory small requests are side by side (i.e. an overflow will write into the next memory area when, with FOOTER, it corrupts a magic constant). It can be implemented at the cost of more memory consumption :)

palainp · 2024-03-08T16:12:08Z

I continued testing and it seems the O(n) time complexity should be avoided (at least with qubes-mirage-fw, I got a bandwidth degradation over time).

Another algorithm like TLSF (e.g. https://github.com/mattconte/tlsf) is probably a better candidate and it also seems to have good performance with low fragmentation :) https://www.researchgate.net/publication/234785757_A_comparison_of_memory_allocators_for_real-time_applications

replace dlmalloc with bitmap+buddy allocators

0f1409e

add a printer for the caller of an allocation function

dcf662d

palainp mentioned this pull request Jan 15, 2024

Replace the internal usage of Cstruct.t by the bytes type mirage/mirage-crypto#146

Merged

palainp mentioned this pull request Jan 30, 2024

Support of OCaml 5.0 (cleaned version) #124

Closed

palainp marked this pull request as ready for review February 6, 2024 17:18

palainp closed this Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace dlmalloc with bitmap+buddy allocators #131

Replace dlmalloc with bitmap+buddy allocators #131

palainp commented Jan 9, 2024

dinosaure commented Jan 15, 2024

palainp commented Jan 15, 2024

palainp commented Feb 6, 2024

palainp commented Feb 7, 2024 •

edited

Loading

palainp commented Mar 8, 2024

Replace dlmalloc with bitmap+buddy allocators #131

Replace dlmalloc with bitmap+buddy allocators #131

Conversation

palainp commented Jan 9, 2024

dinosaure commented Jan 15, 2024

palainp commented Jan 15, 2024

palainp commented Feb 6, 2024

palainp commented Feb 7, 2024 • edited Loading

palainp commented Mar 8, 2024

palainp commented Feb 7, 2024 •

edited

Loading