Run Claude Code Locally on Apple Silicon Using LM Studio and LiteLLM (Zero Cost)
Summary
This article details a method for running Claude Code locally on macOS Apple Silicon, circumventing Anthropic API costs and leveraging high-performance MLX models. While Ollama supports local Claude Code on Windows, Linux, and Intel macOS, it lacks MLX model support, making it inefficient for Apple Silicon. The proposed solution involves using LM Studio for local LLM inference with the Qwen3-Coder-30B model, and LiteLLM as an Anthropic-to-OpenAI protocol bridge. This setup enables Claude Code to function entirely offline with zero cloud usage and API costs, providing an OpenAI-compatible Chat Completions API endpoint at `http://localhost:1234/v1` for local model interaction.
Key takeaway
For AI Engineers and MLOps teams seeking to run agentic coding tools like Claude Code locally on Apple Silicon, this setup offers a robust, cost-free alternative to cloud APIs. By bridging Anthropic's API with an OpenAI-compatible local LLM via LiteLLM, you can achieve high-performance, offline operation with MLX models. Consider implementing this architecture to reduce operational costs and enhance data privacy for your development workflows.
Key insights
Bridge Anthropic's API expectations with local OpenAI-compatible LLM runtimes for cost-free, offline agentic coding.
Principles
- Local inference enhances privacy and reduces API costs.
- Protocol translation enables tool compatibility with diverse LLMs.
Method
Configure LiteLLM as a proxy to translate Anthropic Messages API requests to OpenAI-compatible API calls for local LLM runtimes like LM Studio, using model aliasing and parameter dropping.
In practice
- Use LM Studio with Qwen3-Coder-30B for local inference.
- Map Claude Code's default model to a local alias.
- Employ `drop_params: true` in LiteLLM for compatibility.
Topics
- Agentic Coding
- Local LLM Inference
- MLX Models
- API Bridging
- Apple Silicon
Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by To Data & Beyond.