mimalloc: A new, high-performance, scalable memory allocator for the modern era
Summary
mimalloc is an open-source, scalable memory allocator developed by Microsoft Research's RiSE group, designed as a drop-in replacement for `malloc` and `free`. Initially created in 2020 for programming languages like Lean and Koka, it has since been adopted by large services such as Bing, NoGIL CPython 3.13+, and Unreal Engine, including games like Death Stranding. The allocator is compact, comprising approximately 12K lines of C code, and emphasizes clear internal data structures for easier understanding and portability across platforms like Windows, macOS, and Linux. mimalloc employs thread-local heaps ("theaps") and fixed-size block pages (typically 64 KiB) to minimize synchronization, providing fast allocation and deallocation paths for small blocks, and uses a "page stealing" technique to balance scalability with efficient cross-thread memory sharing, achieving 1.3x committed memory over live data in benchmarks.
Key takeaway
For MLOps Engineers and AI Engineers deploying highly concurrent services with large memory footprints, adopting mimalloc can significantly improve response times and optimize memory utilization. Its design, which balances scalability with efficient cross-thread memory sharing, means your applications can handle hundreds of threads and hundreds of gigabytes of memory more effectively. Consider integrating mimalloc to reduce committed memory overhead and enhance performance in demanding workloads.
Key insights
mimalloc balances high scalability and efficient memory sharing through thread-local heaps and page stealing.
Principles
- Thread-local heaps minimize synchronization.
- Clear data structures enhance portability and reasoning.
- Randomized approaches can simplify complex balancing.
Method
mimalloc uses thread-local heaps with fixed-size block pages and three free lists per page. It optimizes small allocations via a fast path and employs atomic compare-and-swap for cross-thread freeing, alongside a "page stealing" technique for memory sharing.
In practice
- Use mimalloc as a `malloc`/`free` replacement.
- Integrate into CPython 3.13+ for concurrency.
- Apply in game engines like Unreal Engine.
Topics
- mimalloc
- Memory Allocator
- Thread-Local Heaps
- Atomic Operations
- Free List Sharding
Code references
Best for: MLOps Engineer, AI Engineer, NLP Engineer, Software Engineer, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Research.