Gemma 4 with Pi Coding Agent & llama.cpp | Build LLM Resource Calculator with NextJS | ๐ด Live
Summary
This content details a live stream demonstration of setting up and using the PyCoding Agent with a local llama.cpp server to build a Next.js application. The project goal is to create an LLM inference resource calculator that takes model parameters, quantization, and hardware as input to predict VRAM usage and inference speed. The demonstration uses the Gemma 4 26B parameter model with 4-bit quantization. The setup involves configuring `models.json` to point to the local llama.cpp server and enabling Jinja templates for improved tool calling accuracy. The agent successfully generated the initial application plan, created files, and performed small edits, including adding context window presets. However, it encountered difficulties with larger, more complex UI overhauls using a "front-end design" skill, often failing to specify file paths or complete edits. A brief comparison with OpenCode suggested it handled larger edits and skill integration more effectively.
Key takeaway
For AI Engineers evaluating local LLM agents for development, consider PyCoding Agent for simpler, incremental code generation tasks, especially when paired with llama.cpp. Be prepared for potential roadblocks and limitations when attempting complex UI overhauls or integrating advanced skills, as larger edits may require more explicit guidance or a different agent harness like OpenCode for better success. Your choice should balance ease of setup with the complexity of the coding tasks.
Key insights
PyCoding Agent with local LLMs can build simple apps but struggles with complex UI overhauls and skill integration.
Principles
- Jinja templates improve llama.cpp tool calling accuracy.
- Smaller, incremental edits are more successful for LLM agents.
- Agentic skills require careful integration and context injection.
Method
Set up PyCoding Agent by configuring `models.json` to connect to a local llama.cpp server, ensuring Jinja templating is enabled. Define project goals and allow the agent to generate a plan, then guide it through incremental code generation and editing.
In practice
- Use `brew install py-agent` or `npm install -g py-agent` for installation.
- Configure `models.json` in `~/.py/agent/models/` for local LLM integration.
- Start llama.cpp server with `--template-type jinja` for better tool calling.
Topics
- PyCoding Agent
- llama.cpp
- Gemma 4 Model
- LLM Inference Calculator
- Next.js Development
Best for: AI Engineer, Machine Learning Engineer, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Venelin Valkov.