Open Thread 437

2026-06-08 · Source: Astral Codex Ten · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Intermediate, short

Summary

EXO is an open-source tool designed to create peer-to-peer AI clusters using common devices like MacBooks and Raspberry Pis, aiming to reduce cloud inference costs and maintain data locality. It automatically discovers local network devices and shards AI models across them using tensor and pipeline parallelism, enabling faster inference and shared memory. The tool supports mixed hardware configurations, with macOS GPUs utilizing Apple's MLX framework and Linux systems defaulting to CPUs. On compatible Apple hardware, EXO facilitates day zero RDMA over Thunderbolt 5 for direct GPU-to-GPU memory transfers, significantly lowering latency. Community benchmarks demonstrate its capability, with four M3 Ultra Max Studios achieving ~32 tokens/second on Qwen 3 235B, and EXO Labs running DeepSeek V3 671B on eight M4 Mac minis with 512 GB pooled memory.

Key takeaway

For Machine Learning Engineers seeking to reduce cloud inference expenses or enhance data privacy, EXO presents a compelling alternative. You can deploy this open-source tool to build peer-to-peer AI clusters using your existing MacBooks, Raspberry Pis, or other local hardware. This approach allows you to run large models like DeepSeek V3 671B locally, utilizing pooled memory and direct GPU transfers on supported Apple devices. Consider evaluating EXO to gain greater control over your AI workloads and experiment more freely without incurring continuous cloud bills.

Key insights

EXO enables local, cost-effective AI inference by pooling existing hardware into a peer-to-peer cluster, reducing cloud dependency.

Principles

Local P2P AI inference is practical.
Mixed hardware can form effective clusters.
Distributed sharding boosts local model performance.

Method

EXO auto-discovers local network devices, measures bandwidth/latency/memory, then shards AI models using tensor and pipeline parallelism across multiple machines, including mixed hardware, for distributed local inference.

In practice

Build AI clusters with MacBooks and Raspberry Pis.
Combine M4 Pro and Raspberry Pi for inference.
Utilize Thunderbolt 5 for direct GPU memory transfers.

Topics

AI Inference
Distributed AI
Peer-to-Peer Computing
Local AI
Machine Learning Hardware
Apple MLX

Best for: AI Architect, CTO, VP of Engineering/Data, Machine Learning Engineer, AI Engineer, MLOps Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Astral Codex Ten.