Run a Local LLM with OpenClaw on Your Mac Mini
Summary
This guide outlines a method for running a local Large Language Model (LLM) with OpenClaw on a Mac Mini, specifically to eliminate ongoing pay-per-token API expenses from providers like Anthropic or OpenAI. The process, tested on a Mac Mini with an M2 processor and 24GB unified memory, involves installing llama.cpp from source with Metal acceleration enabled, bypassing Ollama for a potential 70% inference speedup. It recommends using the quantized Qwen 3.5-9B parameter model, noted as a top performer as of June 2026, which fits on 16GB or 24GB Macs. The setup includes downloading an agent-compatible chat template, configuring llama-server to run as a launchd daemon, and updating OpenClaw's openclaw.json to use the local model. Verification steps involve testing model registration and a sample Python calculation skill, demonstrating speeds of 20-70 tokens per second.
Key takeaway
For AI Engineers managing OpenClaw agents on Mac Mini hardware, if you are seeking to eliminate recurring API costs, you should implement a local LLM setup. By configuring llama.cpp with a quantized model like Qwen 3.5-9B and running llama-server as a launchd daemon, you can achieve 20-70 tokens per second inference speeds. This approach avoids external API fees, making your agent operations more cost-effective and self-contained. Ensure your Mac Mini has at least an M2 processor and 24GB unified memory for optimal performance.
Key insights
Running a local, quantized LLM on a Mac Mini with llama.cpp significantly reduces OpenClaw API costs while maintaining performance for common agent tasks.
Principles
- Quantization enables larger models on limited hardware.
- llama.cpp with Metal accelerates Mac LLM inference.
- Agent-compatible templates are crucial for OpenClaw.
Method
Install llama.cpp with Metal flags, download a quantized LLM (e.g., Qwen 3.5-9B) and agent template, configure llama-server as a launchd daemon, then update OpenClaw's openclaw.json to use the local API endpoint.
In practice
- Use Qwen 3.5-9B for OpenClaw on 16GB/24GB Macs.
- Build llama.cpp with "-DGGML_METAL=ON".
- Configure launchd for automatic llama-server startup.
Topics
- Local LLMs
- OpenClaw Agents
- Mac Mini
- llama.cpp
- Model Quantization
- Qwen 3.5-9B
Code references
Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.