Gemma 4 with Pi Coding Agent & llama.cpp | Build LLM Resource Calculator with NextJS | 🔴 Live

2026-04-07 · Source: Venelin Valkov · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, extended

Summary

This content details a live stream demonstration of setting up and using the PyCoding Agent with a local llama.cpp server to build a Next.js application. The project goal is to create an LLM inference resource calculator that takes model parameters, quantization, and hardware as input to predict VRAM usage and inference speed. The demonstration uses the Gemma 4 26B parameter model with 4-bit quantization. The setup involves configuring `models.json` to point to the local llama.cpp server and enabling Jinja templates for improved tool calling accuracy. The agent successfully generated the initial application plan, created files, and performed small edits, including adding context window presets. However, it encountered difficulties with larger, more complex UI overhauls using a "front-end design" skill, often failing to specify file paths or complete edits. A brief comparison with OpenCode suggested it handled larger edits and skill integration more effectively.

Key takeaway

For AI Engineers evaluating local LLM agents for development, consider PyCoding Agent for simpler, incremental code generation tasks, especially when paired with llama.cpp. Be prepared for potential roadblocks and limitations when attempting complex UI overhauls or integrating advanced skills, as larger edits may require more explicit guidance or a different agent harness like OpenCode for better success. Your choice should balance ease of setup with the complexity of the coding tasks.

Key insights

PyCoding Agent with local LLMs can build simple apps but struggles with complex UI overhauls and skill integration.

Principles

Jinja templates improve llama.cpp tool calling accuracy.
Smaller, incremental edits are more successful for LLM agents.
Agentic skills require careful integration and context injection.

Method

Set up PyCoding Agent by configuring `models.json` to connect to a local llama.cpp server, ensuring Jinja templating is enabled. Define project goals and allow the agent to generate a plan, then guide it through incremental code generation and editing.

In practice

Use `brew install py-agent` or `npm install -g py-agent` for installation.
Configure `models.json` in `~/.py/agent/models/` for local LLM integration.
Start llama.cpp server with `--template-type jinja` for better tool calling.

Topics

PyCoding Agent
llama.cpp
Gemma 4 Model
LLM Inference Calculator
Next.js Development

Best for: AI Engineer, Machine Learning Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Venelin Valkov.