EIBench: A Simulator-Based Benchmark and Turn-Credit RL for Emotion Management
Summary
EIBench is a new simulator-based benchmark designed for evaluating and training Large Language Models (LLMs) in interactive emotion management. Unlike static evaluations, EIBench focuses on multi-turn interactions, aiming to improve a user's emotional and relational state over time. It comprises 2,222 scenarios, with 2,009 for training and 213 for testing, categorized by a 2x2 taxonomy covering Support, Defense, Repair, and Charm. The benchmark uses an LLM simulator to play the user, updating an emotion-relation state after each turn and mapping the final state to an anchor-based score. This design provides both outcome rewards and dense per-turn feedback for Reinforcement Learning. Evaluations of 15 LLMs revealed strong performance in support and rapport-building but weaknesses in boundary maintenance under pressure. To address this, the authors propose Centered Turn-Credit GRPO (CTC-GRPO), an extension that leverages the simulator's per-turn state updates. CTC-GRPO improved Qwen3-8B from -22.4 to +22.4 on EIBench and showed gains on SAGE (+12.4) and EQBench3 (+20.9%) in out-of-distribution tests.
Key takeaway
For Machine Learning Engineers developing emotionally intelligent LLMs, you should consider EIBench as a robust benchmark for multi-turn interaction. Your models will likely perform well on support tasks but require specific training to handle boundary maintenance under user pressure. Implementing Centered Turn-Credit GRPO (CTC-GRPO) can significantly improve your LLM's emotional management capabilities, as demonstrated by Qwen3-8B's performance jump from -22.4 to +22.4.
Key insights
Simulator-tracked user states provide dense, multi-turn feedback crucial for training LLMs in interactive emotion management.
Principles
- Emotion management requires interactive, multi-turn evaluation.
- Dense per-turn feedback enhances RL for emotional intelligence.
- LLMs struggle with boundary maintenance under pressure.
Method
Centered Turn-Credit GRPO (CTC-GRPO) extends GRPO by reusing simulator's per-turn state updates as dense feedback while preserving the final outcome reward for multi-turn emotion management.
In practice
- Use EIBench for interactive emotion management evaluation.
- Apply CTC-GRPO to improve LLM emotional intelligence.
- Focus LLM training on boundary maintenance scenarios.
Topics
- Emotional Intelligence
- Large Language Models
- Reinforcement Learning
- LLM Benchmarking
- Multi-turn Dialogue
- CTC-GRPO
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.