mimalloc: A new, high-performance, scalable memory allocator for the modern era

2026-05-13 · Source: Microsoft Research · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning · Depth: Advanced, long

Summary

mimalloc is an open-source, scalable memory allocator developed by Microsoft Research's RiSE group, designed as a drop-in replacement for `malloc` and `free`. Initially created in 2020 for programming languages like Lean and Koka, it has since been adopted by large services such as Bing, NoGIL CPython 3.13+, and Unreal Engine, including games like Death Stranding. The allocator is compact, comprising approximately 12K lines of C code, and emphasizes clear internal data structures for easier understanding and portability across platforms like Windows, macOS, and Linux. mimalloc employs thread-local heaps ("theaps") and fixed-size block pages (typically 64 KiB) to minimize synchronization, providing fast allocation and deallocation paths for small blocks, and uses a "page stealing" technique to balance scalability with efficient cross-thread memory sharing, achieving 1.3x committed memory over live data in benchmarks.

Key takeaway

For MLOps Engineers and AI Engineers deploying highly concurrent services with large memory footprints, adopting mimalloc can significantly improve response times and optimize memory utilization. Its design, which balances scalability with efficient cross-thread memory sharing, means your applications can handle hundreds of threads and hundreds of gigabytes of memory more effectively. Consider integrating mimalloc to reduce committed memory overhead and enhance performance in demanding workloads.

Key insights

mimalloc balances high scalability and efficient memory sharing through thread-local heaps and page stealing.

Principles

Thread-local heaps minimize synchronization.
Clear data structures enhance portability and reasoning.
Randomized approaches can simplify complex balancing.

Method

mimalloc uses thread-local heaps with fixed-size block pages and three free lists per page. It optimizes small allocations via a fast path and employs atomic compare-and-swap for cross-thread freeing, alongside a "page stealing" technique for memory sharing.

In practice

Use mimalloc as a `malloc`/`free` replacement.
Integrate into CPython 3.13+ for concurrency.
Apply in game engines like Unreal Engine.

Topics

mimalloc
Memory Allocator
Thread-Local Heaps
Atomic Operations
Free List Sharding

Code references

microsoft/mimalloc

Best for: MLOps Engineer, AI Engineer, NLP Engineer, Software Engineer, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Research.