Improving General Role-Playing Agents via Psychology-Grounded Reasoning and Role-Aware Policy Optimization
Summary
General-purpose role-playing agents often struggle with faithful character portrayal and out-of-distribution generalization due to reliance on superficial behavioral mimicry. To address this, researchers propose Psy-CoT, a psychology-grounded chain-of-thought framework that decomposes pre-response reasoning into three role-specific steps: Interaction Perception, Psychological Empathy, and Logical Construction. This framework enables dynamic thinking from a character profile rather than mere pattern matching. Additionally, they introduce Role-Aware Policy Optimization (RAPO) to counter reward model "hacking" by generic phrases in reinforcement learning. RAPO uses profile-token mutual information to asymmetrically weight gradients, amplifying role-specific tokens under positive advantage and attenuating them under negative. Experiments on CoSER, CharacterBench, and CharacterEval demonstrate Psy-CoT outperforms existing role-playing CoT methods, and RAPO consistently surpasses GRPO across multiple model scales.
Key takeaway
For Machine Learning Engineers developing role-playing agents struggling with character fidelity or out-of-distribution generalization, you should consider integrating psychology-grounded reasoning and specialized reinforcement learning optimization. Explore Psy-CoT's three-step reasoning framework and RAPO's gradient weighting mechanism to enhance your agents' performance and prevent reward model exploitation by generic responses. This approach can lead to more robust and believable character portrayals.
Key insights
Combining psychology-grounded reasoning with role-aware policy optimization significantly enhances general role-playing agent fidelity.
Principles
- Deep internal thought processes improve agent out-of-distribution generalization.
- Reinforcement learning is essential for aligning models with character fidelity.
- Generic phrases can "hack" LLM-based reward models, misleading RL training.
Method
Psy-CoT decomposes reasoning into Interaction Perception, Psychological Empathy, and Logical Construction. RAPO uses profile-token mutual information to weight gradients, amplifying role-specific tokens under positive advantage and attenuating them under negative.
In practice
- Implement multi-step, psychology-grounded reasoning for agent responses.
- Apply mutual information weighting to RL gradients for role-specific learning.
Topics
- Role-Playing Agents
- Chain-of-Thought Reasoning
- Reinforcement Learning
- Policy Optimization
- Large Language Models
- Character Fidelity
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.