Karpathy's LLM Wiki with Local Gemma 4 & llama.cpp | Agentic Tool Calling Test
Summary
This content explores an agentic workflow for Gemma 4, building upon Andrej Karpathy's "LLM Wiki" concept. The core idea is to create an immutable, compounding knowledge base where an LLM manages updates, summarization, and cross-referencing, moving beyond stateless RAG systems. The author demonstrates a local implementation using a 26B parameter Gemma 4 model, quantized to 5-bit (Q5KM), running on a llama.cpp server on an Apple M4 with 48GB unified memory, achieving 40-45 tokens/second. The system, called "Habit Wiki," tracks daily journals, habits, and goals, with the LLM automating bookkeeping and maintenance. The demonstration highlights Gemma 4's capabilities in tool calling (read/write files), understanding code, and evaluating project architecture, while also noting limitations in complex planning compared to larger models like Opus 4.6.
Key takeaway
For AI Engineers exploring local LLM applications, consider implementing Karpathy's LLM Wiki concept to build persistent, evolving knowledge systems. Your focus should be on designing robust index structures and tool-calling mechanisms, as demonstrated with Gemma 4 and llama.cpp, to automate knowledge management and overcome the limitations of stateless RAG, even with smaller, quantized models.
Key insights
Karpathy's LLM Wiki concept offers a vectorless, agentic approach to knowledge management, with LLMs maintaining an evolving, indexed knowledge base.
Principles
- LLMs can automate wiki maintenance.
- Index files enable selective context loading.
- Compounding knowledge improves over stateless RAG.
Method
An LLM acts as a librarian, ingesting new sources, categorizing information, and updating a markdown-based wiki via an index file, performing operations like ingestion, querying, and linking (health checks).
In practice
- Run Gemma 4 (26B, 5-bit quant) on 24GB+ VRAM.
- Use llama.cpp for local LLM serving.
- Implement a "Habit Wiki" for personal tracking.
Topics
- Karpathy's LLM Wiki
- Gemma 4
- Local LLMs
- Agentic Tool Calling
- llama.cpp
Best for: Machine Learning Engineer, AI Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Venelin Valkov.