Tree-of-Experience: A Structured Experience-Management Solution for Self-Evolving Agents under Low-Repetition and Implicit-Reward Environments

2024-01-01 · Source: cs.CL updates on arXiv.org · Field: Finance & Economics — FinTech & Digital Financial Services, Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

The paper introduces FinEvolveBench, a new benchmark for evaluating self-evolving LLM agents in challenging low-repetition tasks with implicit, delayed, and noisy rewards, specifically financial sentiment prediction. Unlike existing benchmarks, FinEvolveBench links daily news-driven predictions to future excess returns for 31 Shenwan first-level industry indices in the Chinese A-share market, covering January 2025 to March 2026. The authors also propose Tree-of-Experience (ToE), a structured experience-management method that organizes, retrieves, validates, and updates agent experience. Experiments using DeepSeek-V4-Flash and Qwen3.6-35B-A3B show that ToE significantly outperforms no-experience baselines and general-purpose experience mechanisms, achieving tsIC of 0.0741 and csIC of 0.0528 on a 20-day horizon with DeepSeek-V4-Flash, highlighting the importance of structured experience in such complex environments.

Key takeaway

For Machine Learning Engineers developing self-evolving LLM agents for complex, real-world scenarios with noisy, delayed feedback, you should consider adopting structured experience management. Your systems will benefit from organizing historical interactions into hierarchical, updatable patterns, as demonstrated by Tree-of-Experience's superior performance in financial sentiment prediction. Focus on formula-based utility updates over direct LLM rewriting for stability, and recognize that experience benefits may be more pronounced for longer prediction horizons.

Key insights

Structured experience management is crucial for LLM agents in low-repetition, implicit-reward environments like financial markets.

Principles

Experience reuse needs validation.
Numerical updates stabilize utility.
Structured experience improves long-horizon prediction.

Method

Tree-of-Experience (ToE) organizes experience as a depth-constrained tree, using hierarchical selection, adaptive expansion, and runtime utility estimation via formula-based updates for financial sentiment prediction.

In practice

Implement hierarchical experience structures.
Use formula-based utility updates.
Prioritize long-horizon predictions for experience benefits.

Topics

LLM Agents
Experience Management
Financial Sentiment Analysis
FinEvolveBench
Tree-of-Experience
Implicit Rewards

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.