Local Agentic Programming on the Cheap: Claude Code + Ollama + Gemma4

2026-04-02 · Source: KDnuggets · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Intermediate, long

Summary

This article details building a local, cost-effective agentic programming stack using Ollama, Google DeepMind's Gemma 4, and Claude Code. It focuses on Gemma 4 26B MoE, released April 2, 2026, under Apache 2.0, which activates 3.8 billion parameters and achieves 77.1% on LiveCodeBench v6 and 86.4% on τ2-bench for agentic tool use. The setup requires ~16–18 GB VRAM for a 256K context window. The guide covers installing Ollama and Claude Code, then configuring a Modelfile to override Ollama's default 4K context to 65536 tokens, setting temperature to 0.2, and adding a specific system prompt for agentic coding. It also explains wiring Claude Code to the local Ollama endpoint via "settings.json" and provides a Python script to verify the setup's health and tool-calling functionality. Common issues like tool parameter errors, context window swapping, and model unloading are addressed with specific fixes.

Key takeaway

For AI Engineers seeking to reduce cloud API costs and enhance privacy for agentic coding, implementing a local stack with Ollama, Gemma 4, and Claude Code is highly effective. You should configure a custom Modelfile to ensure adequate context window and low temperature, then verify tool-calling functionality with the provided script. This setup enables private, zero-cost execution of tasks like code analysis and test generation, freeing up cloud resources for more complex architectural challenges.

Key insights

Local agentic coding with Gemma 4 and Claude Code offers a private, cost-free alternative to cloud LLMs for daily engineering tasks.

Principles

Open-weight LLMs like Gemma 4 can match cloud models for agentic coding.
Modelfiles are crucial for optimizing local LLM context and behavior.
Low temperature improves tool call reliability in agentic loops.

Method

Install Ollama and Claude Code. Create a Modelfile for Gemma 4 to set context (65536 tokens), temperature (0.2), and system prompt. Configure Claude Code's "settings.json" to point to Ollama's local endpoint. Verify setup with a Python script.

In practice

Use "num_ctx 65536" in Modelfile to prevent context window failures.
Set "temperature 0.2" to ensure reliable tool call formatting.
Export "OLLAMA_KEEP_ALIVE=-1" to prevent model unloading delays.

Topics

Local LLMs
Agentic Programming
Gemma 4
Ollama
Claude Code
Modelfile
Tool Calling

Code references

ollama/ollama

Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by KDnuggets.