Micro-DualNet: Dual-Path Spatio-Temporal Network for Micro-Action Recognition

· Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, extended

Summary

Micro-DualNet is a novel dual-path spatio-temporal network designed for micro-action recognition, which involves subtle, localized movements lasting 1-3 seconds. It addresses the challenge that micro-actions exhibit diverse characteristics, with some defined by spatial configurations (e.g., "covering face") and others by temporal dynamics (e.g., "leg shaking"). The network processes anatomically-grounded spatial entities (like head, hands, torso) through parallel Spatial-Temporal (ST) and Temporal-Spatial (TS) pathways. The ST path prioritizes spatial configurations before temporal dynamics, while the TS path inverts this order. Micro-DualNet incorporates entity-level adaptive routing, allowing each body part to learn its optimal processing preference, and a Mutual Action Consistency (MAC) loss to enforce cross-path coherence. The model achieves competitive performance on the MA-52 dataset (65.10% Top-1, 68.72% F1${}_{\text{mean}}$) and state-of-the-art results on iMiGUE (76.88% Top-1) dataset. Clinical validation on an in-house dataset of 290 individuals also demonstrated that Micro-DualNet-detected micro-actions reveal statistically significant behavioral differences between children with autism spectrum disorder, other psychiatric conditions, and typical development.

Key takeaway

For research scientists developing fine-grained video understanding systems, Micro-DualNet offers a robust approach to micro-action recognition. You should consider implementing dual spatio-temporal processing paths with adaptive routing for different body parts, as this architecture significantly improves accuracy, especially for challenging, ambiguous actions. This method also shows promise for automated behavioral assessment in clinical settings, suggesting broader applicability beyond benchmark performance.

Key insights

Micro-DualNet uses dual spatio-temporal pathways and adaptive routing to recognize subtle, diverse micro-actions effectively.

Principles

Method

Micro-DualNet extracts keypoint-guided entity features, processes them via parallel Spatial-Temporal and Temporal-Spatial transformer pathways, and fuses outputs using entity-level adaptive routing, regularized by Mutual Action Consistency loss.

In practice

Topics

Best for: Research Scientist, AI Scientist, Computer Vision Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.