Tree-of-Experience: A Structured Experience-Management Solution for Self-Evolving Agents under Low-Repetition and Implicit-Reward Environments
Summary
The paper introduces FinEvolveBench, a new benchmark for evaluating self-evolving LLM agents in challenging low-repetition tasks with implicit, delayed, and noisy rewards, specifically financial sentiment prediction. Unlike existing benchmarks, FinEvolveBench links daily news-driven predictions to future excess returns for 31 Shenwan first-level industry indices in the Chinese A-share market, covering January 2025 to March 2026. The authors also propose Tree-of-Experience (ToE), a structured experience-management method that organizes, retrieves, validates, and updates agent experience. Experiments using DeepSeek-V4-Flash and Qwen3.6-35B-A3B show that ToE significantly outperforms no-experience baselines and general-purpose experience mechanisms, achieving tsIC of 0.0741 and csIC of 0.0528 on a 20-day horizon with DeepSeek-V4-Flash, highlighting the importance of structured experience in such complex environments.
Key takeaway
For Machine Learning Engineers developing self-evolving LLM agents for complex, real-world scenarios with noisy, delayed feedback, you should consider adopting structured experience management. Your systems will benefit from organizing historical interactions into hierarchical, updatable patterns, as demonstrated by Tree-of-Experience's superior performance in financial sentiment prediction. Focus on formula-based utility updates over direct LLM rewriting for stability, and recognize that experience benefits may be more pronounced for longer prediction horizons.
Key insights
Structured experience management is crucial for LLM agents in low-repetition, implicit-reward environments like financial markets.
Principles
- Experience reuse needs validation.
- Numerical updates stabilize utility.
- Structured experience improves long-horizon prediction.
Method
Tree-of-Experience (ToE) organizes experience as a depth-constrained tree, using hierarchical selection, adaptive expansion, and runtime utility estimation via formula-based updates for financial sentiment prediction.
In practice
- Implement hierarchical experience structures.
- Use formula-based utility updates.
- Prioritize long-horizon predictions for experience benefits.
Topics
- LLM Agents
- Experience Management
- Financial Sentiment Analysis
- FinEvolveBench
- Tree-of-Experience
- Implicit Rewards
Best for: AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.