Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution
Summary
Solvita is an agentic evolution framework designed to enhance large language models (LLMs) for competitive programming by enabling continuous learning without requiring weight updates to the underlying LLM. It reorganizes problem-solving into a closed-loop system involving four specialized agents: Planner, Solver, Oracle, and Hacker. Each agent is paired with a trainable, graph-structured knowledge network. Outcome signals, such as pass/fail verdicts, test certification quality, and adversarial vulnerabilities, are recast as reinforcement learning updates to these network weights, allowing agents to dynamically route future queries based on past successes and failures. Evaluated across CodeContests, APPS, AetherCode, and live Codeforces rounds, Solvita establishes a new state-of-the-art among code-generation agents, outperforming existing multi-agent pipelines and nearly doubling the accuracy of single-pass baselines, achieving a pass@1 accuracy of 82.4% on CodeContests with a GPT-5.4 backbone.
Key takeaway
For research scientists developing advanced code generation systems, Solvita demonstrates that integrating a multi-agent framework with trainable, graph-structured knowledge networks can significantly boost performance in competitive programming. You should consider adopting similar agentic evolution principles, particularly the use of reinforcement learning from execution feedback, to enable continuous improvement in frozen LLMs, potentially achieving "Legendary Grandmaster" level performance in coding challenges.
Key insights
Solvita enhances LLMs for competitive programming through a multi-agent, reinforcement learning framework with continuous, experience-driven knowledge networks.
Principles
- Continuous learning without LLM weight updates.
- Specialized agents with graph-structured knowledge networks.
- Reinforcement learning from outcome signals.
Method
Solvita employs a closed-loop system of strategy selection (Planner), program synthesis (Solver), certified supervision (Oracle), and targeted hacking (Hacker), with each agent's knowledge network updated via reinforcement learning based on problem outcomes.
In practice
- Implement patch-based repair for efficient code debugging.
- Use adversarial testing to expose subtle code bugs.
- Structure agent memory as a learned routing mechanism.
Topics
- Solvita
- Agentic Evolution Framework
- Competitive Programming
- Graph-Structured Knowledge Networks
- Reinforcement Learning
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.