LEAF: A Living Benchmark for Event-Augmented Forecasting

2026-05-19 · Source: cs.LG updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

LEAF is introduced as the first living benchmark designed for event-augmented forecasting tasks, addressing the limitations of existing benchmarks that either lack multidimensional event data or focus on closed environments. This new benchmark evaluates the predictive capabilities of Large Language Models (LLMs) in complex, real-world scenarios, specifically for future event probabilities, trend forecasting, and time series forecasting. LEAF employs a recursive retrieval agent system combined with dual-agent cross-validation to provide relevant auxiliary text. Evaluations using state-of-the-art proprietary and open-weight LLMs demonstrate that these models can effectively utilize signals from complex events to improve predictive performance. In the stock market domain, LLMs showed better performance on equities they confidently identified as more predictable, with events strongly correlating with target equities. LEAF offers a dynamically updating testbed to continuously track and advance event-driven forecasting.

Key takeaway

For research scientists evaluating LLMs for forecasting, LEAF provides a critical, dynamically updating benchmark to assess model performance with real-world, multidimensional event data. You should consider integrating event-augmented approaches, as LLMs demonstrate enhanced predictive capabilities when leveraging complex event signals, particularly in domains like equity forecasting where event correlation is strong.

Key insights

LEAF is a living benchmark for event-augmented forecasting, enabling LLM evaluation in complex, real-world scenarios.

Principles

Multidimensional events enhance forecasting accuracy.
LLMs can leverage complex event signals.
Predictability confidence correlates with performance.

Method

LEAF uses a recursive retrieval agent system with dual-agent cross-validation to provide comprehensive auxiliary text for forecasting tasks, mitigating pre-training data contamination.

In practice

Evaluate LLMs for event-driven stock forecasting.
Incorporate event data for improved trend prediction.

Topics

Event-Augmented Forecasting
Living Benchmark
Large Language Models
Recursive Retrieval Agent
Dual-Agent Cross-Validation

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.