Karpathy's LLM Wiki with Local Gemma 4 & llama.cpp | Agentic Tool Calling Test

· Source: Venelin Valkov · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, extended

Summary

This content explores an agentic workflow for Gemma 4, building upon Andrej Karpathy's "LLM Wiki" concept. The core idea is to create an immutable, compounding knowledge base where an LLM manages updates, summarization, and cross-referencing, moving beyond stateless RAG systems. The author demonstrates a local implementation using a 26B parameter Gemma 4 model, quantized to 5-bit (Q5KM), running on a llama.cpp server on an Apple M4 with 48GB unified memory, achieving 40-45 tokens/second. The system, called "Habit Wiki," tracks daily journals, habits, and goals, with the LLM automating bookkeeping and maintenance. The demonstration highlights Gemma 4's capabilities in tool calling (read/write files), understanding code, and evaluating project architecture, while also noting limitations in complex planning compared to larger models like Opus 4.6.

Key takeaway

For AI Engineers exploring local LLM applications, consider implementing Karpathy's LLM Wiki concept to build persistent, evolving knowledge systems. Your focus should be on designing robust index structures and tool-calling mechanisms, as demonstrated with Gemma 4 and llama.cpp, to automate knowledge management and overcome the limitations of stateless RAG, even with smaller, quantized models.

Key insights

Karpathy's LLM Wiki concept offers a vectorless, agentic approach to knowledge management, with LLMs maintaining an evolving, indexed knowledge base.

Principles

Method

An LLM acts as a librarian, ingesting new sources, categorizing information, and updating a markdown-based wiki via an index file, performing operations like ingestion, querying, and linking (health checks).

In practice

Topics

Best for: Machine Learning Engineer, AI Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Venelin Valkov.