Run a Local LLM with OpenClaw on Your Mac Mini

2026-06-16 · Source: Towards Data Science · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Intermediate, medium

Summary

This guide outlines a method for running a local Large Language Model (LLM) with OpenClaw on a Mac Mini, specifically to eliminate ongoing pay-per-token API expenses from providers like Anthropic or OpenAI. The process, tested on a Mac Mini with an M2 processor and 24GB unified memory, involves installing llama.cpp from source with Metal acceleration enabled, bypassing Ollama for a potential 70% inference speedup. It recommends using the quantized Qwen 3.5-9B parameter model, noted as a top performer as of June 2026, which fits on 16GB or 24GB Macs. The setup includes downloading an agent-compatible chat template, configuring llama-server to run as a launchd daemon, and updating OpenClaw's openclaw.json to use the local model. Verification steps involve testing model registration and a sample Python calculation skill, demonstrating speeds of 20-70 tokens per second.

Key takeaway

For AI Engineers managing OpenClaw agents on Mac Mini hardware, if you are seeking to eliminate recurring API costs, you should implement a local LLM setup. By configuring llama.cpp with a quantized model like Qwen 3.5-9B and running llama-server as a launchd daemon, you can achieve 20-70 tokens per second inference speeds. This approach avoids external API fees, making your agent operations more cost-effective and self-contained. Ensure your Mac Mini has at least an M2 processor and 24GB unified memory for optimal performance.

Key insights

Running a local, quantized LLM on a Mac Mini with llama.cpp significantly reduces OpenClaw API costs while maintaining performance for common agent tasks.

Principles

Quantization enables larger models on limited hardware.
llama.cpp with Metal accelerates Mac LLM inference.
Agent-compatible templates are crucial for OpenClaw.

Method

Install llama.cpp with Metal flags, download a quantized LLM (e.g., Qwen 3.5-9B) and agent template, configure llama-server as a launchd daemon, then update OpenClaw's openclaw.json to use the local API endpoint.

In practice

Use Qwen 3.5-9B for OpenClaw on 16GB/24GB Macs.
Build llama.cpp with "-DGGML_METAL=ON".
Configure launchd for automatic llama-server startup.

Topics

Local LLMs
OpenClaw Agents
Mac Mini
llama.cpp
Model Quantization
Qwen 3.5-9B

Code references

ggml-org/llama.cpp

Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.