Five labs, five minds: building a multi-model finance drama on small models

· Source: Hugging Face - Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, medium

Summary

Thousand Token Wood v2, a multi-model finance drama game, is detailed in a second Build Small Hackathon report, published June 6, 2026. This iteration transforms the original weather-god sandbox into an interactive experience where the player acts as the "Patron of the Wood," a shadow financier manipulating an emergent economy. Each of the five agents in the game operates on a different small model, including gpt-oss-20b (OpenAI), MiniCPM3-4B (OpenBMB), Nemotron-Mini-4B (NVIDIA), and a fine-tuned Qwen 0.5B. The engineering report highlights key challenges and solutions: managing model heterogeneity primarily at the vLLM 0.22.1 serving layer, implementing a robust information firewall to prevent insider tip leaks by keeping sensitive data off-prompt, and bounding agent memory through one-line sentiment summaries to avoid prompt inflation. A representative run confirmed the 0.5B model's reliability, zero information leaks, and the successful simulation of complex market and relationship dynamics.

Key takeaway

For AI Engineers building multi-agent simulations, recognize that small models are reliable format generators, not reasoners; compensate with structural design and fine-tuning. If you are deploying heterogeneous models, focus on solidifying the serving layer, as this is where most friction occurs. Always implement a data flow firewall, proven by tests, for any secret agent information. Bounded memory summaries are crucial for creating persistent agent behaviors without prompt inflation.

Key insights

Building multi-agent simulations with heterogeneous small models requires robust serving layers and careful information management.

Principles

Method

Implement a tolerant JSON parse-and-repair layer for diverse model outputs. Store secret information off-prompt, stripping it from public records, and scan prompts for banned tokens. Summarize agent history into bounded sentiment lines.

In practice

Topics

Best for: AI Engineer, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Hugging Face - Blog.