Micro-DualNet: Dual-Path Spatio-Temporal Network for Micro-Action Recognition
Summary
Micro-DualNet is a novel dual-path spatio-temporal network designed for micro-action recognition, which involves subtle, localized movements lasting 1-3 seconds. It addresses the challenge that micro-actions exhibit diverse characteristics, with some defined by spatial configurations (e.g., "covering face") and others by temporal dynamics (e.g., "leg shaking"). The network processes anatomically-grounded spatial entities (like head, hands, torso) through parallel Spatial-Temporal (ST) and Temporal-Spatial (TS) pathways. The ST path prioritizes spatial configurations before temporal dynamics, while the TS path inverts this order. Micro-DualNet incorporates entity-level adaptive routing, allowing each body part to learn its optimal processing preference, and a Mutual Action Consistency (MAC) loss to enforce cross-path coherence. The model achieves competitive performance on the MA-52 dataset (65.10% Top-1, 68.72% F1${}_{\text{mean}}$) and state-of-the-art results on iMiGUE (76.88% Top-1) dataset. Clinical validation on an in-house dataset of 290 individuals also demonstrated that Micro-DualNet-detected micro-actions reveal statistically significant behavioral differences between children with autism spectrum disorder, other psychiatric conditions, and typical development.
Key takeaway
For research scientists developing fine-grained video understanding systems, Micro-DualNet offers a robust approach to micro-action recognition. You should consider implementing dual spatio-temporal processing paths with adaptive routing for different body parts, as this architecture significantly improves accuracy, especially for challenging, ambiguous actions. This method also shows promise for automated behavioral assessment in clinical settings, suggesting broader applicability beyond benchmark performance.
Key insights
Micro-DualNet uses dual spatio-temporal pathways and adaptive routing to recognize subtle, diverse micro-actions effectively.
Principles
- Micro-actions require flexible spatio-temporal decomposition.
- Adaptive entity extraction improves performance over fixed regions.
- Cross-path consistency enhances specialized representations.
Method
Micro-DualNet extracts keypoint-guided entity features, processes them via parallel Spatial-Temporal and Temporal-Spatial transformer pathways, and fuses outputs using entity-level adaptive routing, regularized by Mutual Action Consistency loss.
In practice
- Use keypoint-guided entity extraction for robust spatial grounding.
- Employ dual spatio-temporal paths for diverse action types.
- Implement adaptive routing for entity-specific processing preferences.
Topics
- Micro-Action Recognition
- Dual-Path Networks
- Spatio-Temporal Modeling
- Adaptive Routing
- Mutual Action Consistency Loss
Best for: Research Scientist, AI Scientist, Computer Vision Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.