Hexo Labs Open-Sources SIA: A Self-Improving Agent That Updates Both the Harness and the Model Weights

2026-05-29 · Source: Machine Learning ML & Generative AI News · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Advanced, quick

Summary

Hexo Labs has open-sourced SIA, a Self-Improving Agent designed to update both the agent's operational harness and its underlying model weights within a single iterative loop. Unlike most self-improving agents that focus on either scaffold rewriting or reinforcement learning (RL) pipeline training, SIA employs a Feedback-Agent that analyzes each run's full trajectory to decide whether to modify the harness or update the model's weights. This dual approach yielded significant performance improvements; on LawBench, accuracy increased from 50.0% to 70.1% top-1 accuracy (+20.1 pp). For the TriMul CUDA kernel, execution time decreased from 12,483 µs to 1,017 µs (91.9% faster), and scRNA-seq denoising improved from 0.241 to 0.289 mse_norm. The Feedback-Agent dynamically selects the appropriate RL method per task, such as PPO with GAE for LawBench or GRPO for denoising, demonstrating adaptability. SIA utilizes gpt-oss-120b as its base model with LoRA rank 32, and Claude Sonnet 4.6 for the meta and feedback agents.

Key takeaway

For AI Engineers developing self-improving agents, SIA demonstrates that concurrently optimizing both the agent's harness and its model weights can significantly overcome performance plateaus. You should consider implementing a feedback mechanism that intelligently decides between scaffold modifications and weight updates, adapting the reinforcement learning approach to each specific task. This integrated strategy can yield substantial gains, as seen with the 20.1 pp accuracy increase on LawBench, but be mindful of the increased complexity in training and applicability.

Key insights

SIA is a self-improving agent that simultaneously optimizes both the operational harness and model weights for superior performance.

Principles

Dual optimization (harness + weights) surpasses single-knob tuning.
Dynamic RL method selection improves task-specific performance.
Weight updates can introduce implicit domain knowledge.

Method

A Feedback-Agent reads full run trajectories, then decides to rewrite the harness or update model weights, selecting the optimal RL method per task.

In practice

Apply dual-loop optimization for complex agent systems.
Experiment with dynamic RL strategy selection.
Integrate weight updates to capture implicit domain knowledge.

Topics

Self-Improving Agents
Reinforcement Learning
Large Language Models
Model Optimization
Agent Architectures
Hexo Labs

Code references

hexo-ai/sia

Best for: Research Scientist, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning ML & Generative AI News.