Improving General Role-Playing Agents via Psychology-Grounded Reasoning and Role-Aware Policy Optimization

2026-06-25 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

General-purpose role-playing agents often struggle with faithful character portrayal and out-of-distribution generalization due to reliance on superficial behavioral mimicry. To address this, researchers propose Psy-CoT, a psychology-grounded chain-of-thought framework that decomposes pre-response reasoning into three role-specific steps: Interaction Perception, Psychological Empathy, and Logical Construction. This framework enables dynamic thinking from a character profile rather than mere pattern matching. Additionally, they introduce Role-Aware Policy Optimization (RAPO) to counter reward model "hacking" by generic phrases in reinforcement learning. RAPO uses profile-token mutual information to asymmetrically weight gradients, amplifying role-specific tokens under positive advantage and attenuating them under negative. Experiments on CoSER, CharacterBench, and CharacterEval demonstrate Psy-CoT outperforms existing role-playing CoT methods, and RAPO consistently surpasses GRPO across multiple model scales.

Key takeaway

For Machine Learning Engineers developing role-playing agents struggling with character fidelity or out-of-distribution generalization, you should consider integrating psychology-grounded reasoning and specialized reinforcement learning optimization. Explore Psy-CoT's three-step reasoning framework and RAPO's gradient weighting mechanism to enhance your agents' performance and prevent reward model exploitation by generic responses. This approach can lead to more robust and believable character portrayals.

Key insights

Combining psychology-grounded reasoning with role-aware policy optimization significantly enhances general role-playing agent fidelity.

Principles

Deep internal thought processes improve agent out-of-distribution generalization.
Reinforcement learning is essential for aligning models with character fidelity.
Generic phrases can "hack" LLM-based reward models, misleading RL training.

Method

Psy-CoT decomposes reasoning into Interaction Perception, Psychological Empathy, and Logical Construction. RAPO uses profile-token mutual information to weight gradients, amplifying role-specific tokens under positive advantage and attenuating them under negative.

In practice

Implement multi-step, psychology-grounded reasoning for agent responses.
Apply mutual information weighting to RL gradients for role-specific learning.

Topics

Role-Playing Agents
Chain-of-Thought Reasoning
Reinforcement Learning
Policy Optimization
Large Language Models
Character Fidelity

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.