Principles of Mechanical Sympathy

2026-04-07 · Source: Martin Fowler · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning · Depth: Advanced, medium

Summary

Mechanical sympathy, a software optimization practice popularized by Martin Thompson, focuses on designing software that is sympathetic to its underlying hardware to achieve peak performance. This approach, detailed in an article published on April 07, 2026, distills into four core principles: predictable memory access, awareness of cache lines to prevent false sharing, the single-writer principle, and natural batching. These principles are applicable across various systems, from AI inference servers running billion-parameter models on laptops to distributed data platforms. For example, predictable memory access leverages CPU cache hierarchy, while the single-writer principle, demonstrated with an ONNX text embedding service, uses a dedicated actor thread and asynchronous messaging to avoid mutex overhead and head-of-line blocking. Natural batching further optimizes this by greedily forming batches, outperforming timeout-based strategies by up to twice the performance.

Key takeaway

For Machine Learning Engineers or Software Architects optimizing high-performance systems, embracing mechanical sympathy is crucial. You should prioritize observability before optimization, defining SLIs, SLOs, and SLAs to guide your efforts. By applying principles like predictable memory access, avoiding false sharing, and implementing the single-writer principle with natural batching, you can significantly enhance system throughput and reduce latency, even for complex AI models. This approach ensures your software fully utilizes modern hardware capabilities.

Key insights

Mechanical sympathy optimizes software by aligning its design with underlying hardware principles for peak performance.

Principles

Design for predictable, sequential memory access.
Prevent false sharing by understanding cache lines.
Apply the single-writer principle for concurrency.

Method

Refactor multithreaded systems by dedicating a single "actor" thread to own all writes to a resource, using asynchronous messaging from other threads to submit writes.

In practice

Scan entire databases sequentially, then filter.
Pad cache lines to prevent false sharing.
Use actors for single-writer concurrency.

Topics

Mechanical Sympathy
Performance Optimization
CPU Cache Management
Single Writer Principle
Natural Batching
AI Inference Optimization

Best for: Software Engineer, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Martin Fowler.