EIBench: A Simulator-Based Benchmark and Turn-Credit RL for Emotion Management

2026-06-14 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

EIBench is a new simulator-based benchmark designed for evaluating and training Large Language Models (LLMs) in interactive emotion management. Unlike static evaluations, EIBench focuses on multi-turn interactions, aiming to improve a user's emotional and relational state over time. It comprises 2,222 scenarios, with 2,009 for training and 213 for testing, categorized by a 2x2 taxonomy covering Support, Defense, Repair, and Charm. The benchmark uses an LLM simulator to play the user, updating an emotion-relation state after each turn and mapping the final state to an anchor-based score. This design provides both outcome rewards and dense per-turn feedback for Reinforcement Learning. Evaluations of 15 LLMs revealed strong performance in support and rapport-building but weaknesses in boundary maintenance under pressure. To address this, the authors propose Centered Turn-Credit GRPO (CTC-GRPO), an extension that leverages the simulator's per-turn state updates. CTC-GRPO improved Qwen3-8B from -22.4 to +22.4 on EIBench and showed gains on SAGE (+12.4) and EQBench3 (+20.9%) in out-of-distribution tests.

Key takeaway

For Machine Learning Engineers developing emotionally intelligent LLMs, you should consider EIBench as a robust benchmark for multi-turn interaction. Your models will likely perform well on support tasks but require specific training to handle boundary maintenance under user pressure. Implementing Centered Turn-Credit GRPO (CTC-GRPO) can significantly improve your LLM's emotional management capabilities, as demonstrated by Qwen3-8B's performance jump from -22.4 to +22.4.

Key insights

Simulator-tracked user states provide dense, multi-turn feedback crucial for training LLMs in interactive emotion management.

Principles

Emotion management requires interactive, multi-turn evaluation.
Dense per-turn feedback enhances RL for emotional intelligence.
LLMs struggle with boundary maintenance under pressure.

Method

Centered Turn-Credit GRPO (CTC-GRPO) extends GRPO by reusing simulator's per-turn state updates as dense feedback while preserving the final outcome reward for multi-turn emotion management.

In practice

Use EIBench for interactive emotion management evaluation.
Apply CTC-GRPO to improve LLM emotional intelligence.
Focus LLM training on boundary maintenance scenarios.

Topics

Emotional Intelligence
Large Language Models
Reinforcement Learning
LLM Benchmarking
Multi-turn Dialogue
CTC-GRPO

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.