Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models
Summary
NVIDIA researchers introduce GenCluster, a scalable test-time compute framework that enables open-weight large language models (LLMs) to achieve gold medal-level performance at the International Olympiad in Informatics (IOI) 2025. This framework addresses the challenge of matching proprietary models' performance with transparent, reproducible methods. GenCluster integrates large-scale solution generation, behavioral clustering, LLM-based ranking via a tournament, and a round-robin submission strategy to navigate IOI's strict validation budgets and submission limits. Experiments demonstrate that the gpt-oss-120b model, when combined with GenCluster and 5000 generations per subtask, achieved a gold medal score of 446.75, marking the first time an open-weight model has reached this level. The approach shows consistent performance scaling with increased compute, narrowing the gap between open and closed AI systems in competitive programming.
Key takeaway
For research scientists developing competitive programming LLMs, GenCluster offers a transparent and reproducible method to achieve top-tier performance with open-weight models. You should consider implementing its four-stage pipeline—parallel generation, behavioral clustering, tournament-based ranking, and round-robin submission—to maximize scores under strict competition constraints. This approach demonstrates that scaling test-time compute is crucial for bridging the performance gap between open and proprietary systems, providing a clear path to gold-level achievements.
Key insights
GenCluster enables open-weight LLMs to achieve IOI gold medal performance through scalable test-time compute and strategic solution selection.
Principles
- Test-time compute scales LLM performance.
- Behavioral clustering improves solution selection.
- LLM-as-a-judge can rank code solutions.
Method
GenCluster generates many candidate solutions, clusters them by behavioral similarity, ranks clusters using an LLM-based tournament, and employs a round-robin submission strategy under IOI constraints.
In practice
- Generate 5000+ solutions per subtask for optimal results.
- Use C++ for competitive programming solutions.
- Employ longest reasoning trace as a correctness proxy.
Topics
- Competitive Programming
- Large Language Models
- Test-Time Compute
- GenCluster
- IOI Gold Medal
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.