EmoNet: Speaker-Aware Transformers for Emotion Recognition — and What I’d Build Differently in 2026

· Source: Towards Data Science · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, medium

Summary

The EmoNet model, developed for Emotion Recognition in Conversation (ERC), achieved a Weighted F1 of 39.18 on the EmoryNLP dataset in March 2024, outperforming its CoMPM baseline by +1.81 F1. ERC is challenging due to the contextual and speaker-dependent nature of emotions in text-only dialogues. EmoNet introduced three key contributions: Global Speaker Identity, assigning stable IDs across dialogues; a Speaker Behaviour Module utilizing a GRU to compress speaker history; and Weighted Cross-Entropy Loss to address class imbalance without distorting conversational sequences. While Global Speaker Identity initially degraded performance, its combination with the Speaker Behaviour Module ultimately led to EmoNet's success. By 2026, the ERC field evolved to LLaMA-2–7B-based systems with LoRA fine-tuning and retrieval-augmented prompting, yet EmoNet's core intuitions regarding speaker-specific patterns persist, now integrated into LLM instruction tuning or retrieval contexts.

Key takeaway

For Machine Learning Engineers building conversational AI, recognize that speaker identity and historical context are critical, even as models evolve. If you are developing emotion recognition systems, consider integrating global speaker characteristics and their temporal behavior, perhaps via retrieval-augmented LLM prompts or instruction tuning, rather than solely relying on local dialogue context. Your architectural intuitions about speaker patterns can be adapted across different model paradigms.

Key insights

Speaker-specific patterns and historical context are crucial for accurate emotion recognition in conversations.

Principles

Method

EmoNet combines RoBERTa embeddings with a GRU for global, temporally decaying speaker history and weighted cross-entropy loss for imbalanced conversational data.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.