Dynamic Fusion-Aware Graph Convolutional Neural Network for Multimodal Emotion Recognition in Conversations

2026-03-25 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing, Computer Vision · Depth: Expert, extended

Summary

The paper introduces a Dynamic Fusion-aware Graph Convolutional Neural Network (DF-GCN) for multimodal emotion recognition in conversations (MERC). DF-GCN addresses the limitation of existing GCN-based methods that use fixed parameters for multimodal feature fusion across different emotion types, which often compromises performance on specific emotions. The proposed model integrates ordinary differential equations (ODEs) into GCNs to capture dynamic emotional dependencies within utterance interaction networks. It also leverages prompts generated by a global information vector (GIV) to guide the dynamic fusion of multimodal features, allowing for adaptive parameter changes during inference for different emotion categories. Experiments on the IEMOCAP and MELD datasets demonstrate that DF-GCN achieves superior performance, particularly in weighted accuracy (WA) and weighted F1 (WF1) scores, outperforming existing mainstream methods while maintaining comparable computational efficiency.

Key takeaway

Research Scientists developing multimodal emotion recognition systems should consider integrating dynamic fusion mechanisms, such as those in DF-GCN, to overcome the limitations of static parameter models. Your models can achieve more flexible and accurate emotion classification by allowing network parameters to adapt to different emotion categories during inference, significantly enhancing performance on challenging datasets like IEMOCAP and MELD, especially for minority emotion classes.

Key insights

Dynamic fusion of multimodal features via ODE-integrated GCNs improves conversational emotion recognition.

Principles

Emotional states evolve continuously, not discretely.
Global context guides adaptive multimodal feature fusion.
Dynamic parameters enhance model generalization.

Method

DF-GCN uses a Static Graph Convolution (SGCODE) block and a Dynamic Graph Convolution (DGCODE) block with ODEs. It generates a Global Information Vector (GIV) via Transformer and global average pooling, then uses a Prompt Generation Network (PGN) to create dynamic weights for DGCODE's adaptive fusion.

In practice

Use RoBERTa, OpenSMILE, DenseNet for initial feature encoding.
Employ Bi-GRU for text context, FC networks for audio/video.
Construct emotional interaction graphs using cosine similarity.

Topics

Multimodal Emotion Recognition
Graph Convolutional Networks
Dynamic Fusion
Neural Ordinary Differential Equations
Prompt Learning

Code references

yuntaoshou/DFGCN

Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.