Deep Temporal Modeling and Ensemble Fusion for Multimodal Emotion Recognition from Physiological Signals

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

A study evaluated deep learning models, including Long Short-Term Memory (LSTM), Temporal Convolutional Networks (TCN), and Transformer architectures, for multimodal emotion recognition from physiological signals. Utilizing the WESAD dataset, the research focused on wrist and chest sensor data to assess affect recognition. Ablation studies determined individual modality contributions, training models on wrist-only and chest-only inputs. The work also explored both early fusion, by concatenating sensor signals, and a late-fusion ensemble strategy, combining predictions from all three architectures. Transformer models achieved the highest accuracy in multimodal settings, while TCNs excelled in wrist-only configurations. The late-fusion ensemble method ultimately yielded the highest overall accuracy of 98.91 +/- 0.13% and a macro-F1 score of 98.56 +/- 0.17%, demonstrating the efficacy of fusion techniques.

Key takeaway

For Machine Learning Engineers developing physiological emotion recognition systems, you should prioritize ensemble-based fusion strategies, specifically late-fusion, to achieve superior accuracy. If your application involves multimodal wrist and chest sensor data, consider Transformer models. For resource-constrained or wrist-only deployments, TCNs offer a strong alternative. Integrating these fusion and model choices will significantly enhance the robustness and performance of your affective computing solutions.

Key insights

Combining deep temporal models with sensor and ensemble fusion significantly boosts physiological emotion recognition accuracy.

Principles

Method

The study evaluated LSTM, TCN, and Transformer models on WESAD using wrist/chest signals. It performed ablation studies, early fusion via concatenation, and a late-fusion ensemble of model predictions.

In practice

Topics

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.