Five labs, five minds: building a multi-model finance drama on small models
Summary
Thousand Token Wood v2, a multi-model finance drama game, is detailed in a second Build Small Hackathon report, published June 6, 2026. This iteration transforms the original weather-god sandbox into an interactive experience where the player acts as the "Patron of the Wood," a shadow financier manipulating an emergent economy. Each of the five agents in the game operates on a different small model, including gpt-oss-20b (OpenAI), MiniCPM3-4B (OpenBMB), Nemotron-Mini-4B (NVIDIA), and a fine-tuned Qwen 0.5B. The engineering report highlights key challenges and solutions: managing model heterogeneity primarily at the vLLM 0.22.1 serving layer, implementing a robust information firewall to prevent insider tip leaks by keeping sensitive data off-prompt, and bounding agent memory through one-line sentiment summaries to avoid prompt inflation. A representative run confirmed the 0.5B model's reliability, zero information leaks, and the successful simulation of complex market and relationship dynamics.
Key takeaway
For AI Engineers building multi-agent simulations, recognize that small models are reliable format generators, not reasoners; compensate with structural design and fine-tuning. If you are deploying heterogeneous models, focus on solidifying the serving layer, as this is where most friction occurs. Always implement a data flow firewall, proven by tests, for any secret agent information. Bounded memory summaries are crucial for creating persistent agent behaviors without prompt inflation.
Key insights
Building multi-agent simulations with heterogeneous small models requires robust serving layers and careful information management.
Principles
- Heterogeneity in agents enhances market realism.
- Serving layer friction dominates multi-model setups.
- Secret agent information needs a data flow firewall.
Method
Implement a tolerant JSON parse-and-repair layer for diverse model outputs. Store secret information off-prompt, stripping it from public records, and scan prompts for banned tokens. Summarize agent history into bounded sentiment lines.
In practice
- Base vLLM on a CUDA devel image.
- Build a tolerant JSON parse-and-repair layer.
- Scan agent prompts for banned tokens.
Topics
- Multi-agent Systems
- Small Language Models
- LLM Serving
- Information Security
- Prompt Engineering
- Game AI
Best for: AI Engineer, Machine Learning Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Hugging Face - Blog.