Generative Data Augmentation for Skeleton Action Recognition

2026-04-16 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

Researchers propose a conditional generative pipeline for data augmentation in skeleton action recognition, addressing the high cost and labor intensity of collecting large, diverse, and well-annotated 3D skeleton datasets. The method learns the distribution of real skeleton sequences constrained by action labels, synthesizing diverse and high-fidelity data. This approach effectively generates skeleton sequences even with limited training samples, achieving competitive recognition performance in low-data scenarios and demonstrating strong generalization. The pipeline utilizes a Transformer-based encoder-decoder architecture, a generative refinement module, and a dropout mechanism to balance fidelity and diversity during sampling. Experiments on HumanAct12 and NTU-VIBE datasets show consistent accuracy improvements for multiple skeleton-based action recognition models in both few-shot and full-data settings.

Key takeaway

For research scientists developing skeleton-based action recognition models, consider integrating conditional generative data augmentation to mitigate the challenges of limited or expensive 3D skeleton datasets. This approach can significantly improve model accuracy, particularly in few-shot learning scenarios, by synthesizing diverse and high-fidelity training data. You should explore Transformer-based encoder-decoder architectures with refinement modules to balance data fidelity and diversity effectively.

Key insights

A conditional generative pipeline synthesizes diverse skeleton data to augment action recognition, improving performance in low-data settings.

Principles

Generative models can overcome data scarcity.
Balancing fidelity and diversity is key for synthetic data.

Method

The method employs a Transformer-based encoder-decoder with a generative refinement module and dropout mechanism to learn and synthesize skeleton sequences conditioned on action labels, ensuring diversity and fidelity.

In practice

Apply generative augmentation for scarce 3D skeleton data.
Use Transformer-based architectures for sequence generation.

Topics

Generative Data Augmentation
Skeleton Action Recognition
Transformer-based Encoder-Decoder
Few-Shot Learning
HumanAct12 Dataset

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.