Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace dlmalloc with bitmap+buddy allocators #131

Closed
wants to merge 2 commits into from

Conversation

palainp
Copy link
Member

@palainp palainp commented Jan 9, 2024

Hi devs!
The current memory allocation implementation is dlmalloc which provides great performances [1] but with a big amount of code (~6k LoC) and has recently some issues with memory used computation (the mallinfo provided is clean but need to walk along the whole heap and a fast pre-computed value was wrongly computed due to a bug in my code :( ). My observation is that the complexity of the code makes it hard to get into.

The Ocaml-5 support PR shows that Ocaml 5 now needs mmap/munmap which is not provided by dlmalloc, and this needs some more work.
Therefore I resued the bitmap allocator in mirage-xen (for grant pages) to be able to provide page aligned memory areas (this is useful for mmap and posix_mem_align). At the same time I added a binary buddy allocator for small memory requests (32< x <4096), large requests (>=4096) fall back to posix_mem_align. The code is now shorter (~700 LoC) and it should be simpler to understand (I hope).

Naturally I'm unable to provide a deep survey with that allocator :(
I run some basic tests and it works with qubes-mirage-firewall with similar performance, I planed to run long run tests to find some more issues.
The bitmap allocator is similar to a first-fit allocator and will probably lead to fragmentated memory which is not great, it may be possible to find a smarter allocator using linked lists for example at the cost of more code complexity. Time will tell.

[1]: as example a performance survey https://www.researchgate.net/publication/314689739_Experimental_Evaluation_and_Comparison_of_Memory_Allocators_in_the_GNULinux_Operating_System

@dinosaure
Copy link
Member

The code is well documented indeed, but, as you said, we don't know yet the implication on our different applications of such new allocator. It can be interesting to test it with a widely used application such as tlstunnel for robur‧coop - however, we still experiment on our new utcp stack.

To clarify the goal of this PR, it's to help our upgrade to OCaml 5, I'm right?

One possibility (as we did for Mirage) is to maintain 2 ocaml-solo5, one that remains on OCaml 4.14 and one that is available for OCaml 5. The work for the latter is becoming more and more substantial and harder to maintain too. I don't know how to manage this in terms of versioning though (/cc @mirage/core).

@palainp
Copy link
Member Author

palainp commented Jan 15, 2024

Thank you for your feedback!

I can explain a bit about my motivations here: I started writing this patch for the 500-cleaned PR because Ocaml 5+ needs mmap/munmap. As there are a lot of changes with getting Ocaml5 runtime up, I wanted to split the changes and tried against the current ocaml-solo5 repository targeting Ocaml 4.14.
In my mind, fewer lines of code might be better, and this implementation, even a little naive around the edges, seems at first glance to have sufficient performance. Indeed, this assertion requires more testing in different contexts and I'll be happy to help, when your tests with utcp will be done and if you want to try this out :)

So far I tested with qubes-mirage-firewall and dns-resolver (this one is running on a public network, so it's been receiving regular connections, without any issues for a bit less than 1 week).

About this PR, I think it could be an intermediate step with Ocaml4 before moving to Ocaml5. And I agree with you, it'll be a pain to maintain two ocaml-solo5 but I failed to figure how to update configure/Makefile to be able to target both compilers, so I currently don't see how to deal with that :(

@palainp
Copy link
Member Author

palainp commented Feb 6, 2024

I didn't observed yet issues, I guess it's ready for review (maybe also related to a potential issue in mirage/ocaml-git#631 and the difficulties around bug tracking).

@palainp
Copy link
Member Author

palainp commented Feb 7, 2024

And FWIW a couple of notes to have in mind when reviewing:

  • the current overhead (to store metadata) is hardcoded at nolibc/mm.c:515 to 1MB, it probably can be reduced or calculated dynamically with the available memory, as well for the reserved stack size
  • there is an assertion to avoid metadata corruption at nolibc/mm.c:634 and nolibc/mm.c:679. malloc shouldn't crash the application but return NULL, but my position was that if we don't have enough room for keeping information for one more page allocation we're in trouble and it's better to crash...
  • for "large" requests (at least 4kB), the algorithm complexity is linear with the number of already allocated pages and is "first fit" (might leads to memory fragmentation), for "small" requests (less than 4kB), the algorithm is linear with the number of pages reserved for small requests in order to find a bin where the request can fit (if not found it'll reserve another page, see before :) ), then loglinear for finding the room for the request
  • I still not implemented something like the FOOTER in dlmalloc.i so memory pages and memory small requests are side by side (i.e. an overflow will write into the next memory area when, with FOOTER, it corrupts a magic constant). It can be implemented at the cost of more memory consumption :)

@palainp
Copy link
Member Author

palainp commented Mar 8, 2024

I continued testing and it seems the O(n) time complexity should be avoided (at least with qubes-mirage-fw, I got a bandwidth degradation over time).

Another algorithm like TLSF (e.g. https://github.com/mattconte/tlsf) is probably a better candidate and it also seems to have good performance with low fragmentation :) https://www.researchgate.net/publication/234785757_A_comparison_of_memory_allocators_for_real-time_applications

@palainp palainp closed this Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants