ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

ArcANE (Arc-Aware Narrative Evaluation) is a new benchmark designed to assess whether role-playing language agents (RPLAs) can maintain character consistency as their psychological state evolves through a story, rather than just recalling facts. This automatically constructed benchmark utilizes 17 novels and 80 principal characters, segmenting narratives into psychological phases. It probes agents with identical scenarios across these phases, including situations both within and beyond the source text. Across six models and six context modes, conditioning on the Character Arc consistently outperformed other context strategies, with the largest performance gap observed in scenarios outside the source text. Furthermore, fine-tuning open-weight models resulted in ArcANE-8B/32B, which further amplified the Character Arc advantage in out-of-source contexts.

Key takeaway

For NLP Engineers developing or evaluating role-playing language agents, relying solely on factual recall benchmarks is insufficient for assessing true character consistency. You should integrate character arc-aware conditioning into your agent designs, as it demonstrably improves psychological trajectory alignment, especially for novel scenarios. Consider leveraging benchmarks like ArcANE to validate your agents' ability to evolve character values and behavior dynamically.

Key insights

Role-playing language agents require evaluation beyond factual recall to assess character psychological evolution.

Principles

Character psychological trajectory is crucial for realistic RPLAs.
Existing benchmarks inadequately measure character evolution.
Character Arc conditioning significantly improves RPLA consistency.

Method

ArcANE segments narratives into psychological phases, then probes agents with identical scenarios across these phases, including situations not explicitly in the source text.

In practice

Condition LLMs on Character Arc for improved RPLA performance.
Fine-tune open-weight models using ArcANE-like data.

Topics

Role-Playing Language Agents
Character Arc
Narrative Evaluation
LLM Benchmarking
Context Conditioning
Fine-tuning

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.