Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

2026-04-09 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Researchers have introduced OmniBehavior, a novel user simulation benchmark built exclusively from real-world data, designed to overcome the limitations of existing benchmarks that rely on isolated scenarios, narrow action spaces, or synthetic data. OmniBehavior integrates long-horizon, cross-scenario, and heterogeneous behavioral patterns into a unified framework. Initial evaluations using this benchmark demonstrate that prior datasets with isolated scenarios suffer from "tunnel vision," whereas actual decision-making involves long-term, cross-scenario causal chains. State-of-the-art Large Language Models (LLMs) struggle to accurately simulate these complex behaviors, with performance not improving significantly even with larger context windows. A key finding is a structural bias in LLMs, which tend to simulate a "positive average person," leading to hyper-activity, persona homogenization, and a Utopian bias, thereby losing individual differences and long-tail behaviors.

Key takeaway

For research scientists developing user simulators, you should recognize that current LLMs exhibit a structural bias towards an "average person" persona, leading to hyper-activity and loss of individual differences. Your focus should shift towards developing models that can capture long-horizon, cross-scenario, and heterogeneous behavioral patterns, moving beyond isolated scenarios. Consider using real-world benchmarks like OmniBehavior to validate your models' fidelity to authentic human behavior.

Key insights

Real-world human behavior simulation requires long-horizon, cross-scenario data, which current LLMs struggle to model accurately.

Principles

Real-world behavior is long-term and cross-scenario.
LLMs exhibit a "Utopian bias" in simulation.

Method

OmniBehavior is a user simulation benchmark constructed from real-world data, integrating long-horizon, cross-scenario, and heterogeneous behavioral patterns for evaluating LLMs.

In practice

Use OmniBehavior for realistic LLM user simulation.
Address LLM bias towards "average person" behavior.

Topics

Large Language Models
Human Behavior Simulation
OmniBehavior Benchmark
Real-world Data
LLM Bias

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.