CalTennis: Large Multi-View Tennis Video Dataset and Benchmark of Monocular-to-3D Pose Estimation
Summary
The Caltech Tennis Dataset (CalTennis) is introduced as a large-scale video benchmark for monocular-to-3D pose estimation in diverse environments. Comprising over 11 million frames (51 hours) from 40 players, captured by 2-6 synchronized cameras at 60 Hz, CalTennis is 10 times larger than existing in-the-wild human motion datasets and 3 times larger than MOCAP-ground-truthed alternatives. It uniquely provides synchronized multi-view recordings of expert athletic motion, enabling label-free evaluation. The dataset utilizes a simple, standardized collection protocol with automated video calibration and synchronization. Benchmarking state-of-the-art methods on CalTennis reveals that while 3D joint angle recovery is accurate, models consistently struggle with depth and foot contact estimation. The authors propose novel footwork and stability metrics to expose these failure modes and guide future improvements.
Key takeaway
For Computer Vision Engineers developing 3D pose estimation models, CalTennis offers an unprecedented benchmark to rigorously evaluate performance. Your models likely achieve accurate 3D joint angles, but you should prioritize improving depth and foot contact estimation, as these remain significant weaknesses. Utilize the proposed footwork and stability metrics to identify specific failure modes and guide your development efforts toward more robust athletic motion analysis.
Key insights
CalTennis provides a large-scale, multi-view benchmark for 3D pose estimation, highlighting depth and foot contact as critical challenges.
Principles
- Multi-view video enables label-free 3D pose evaluation.
- Simple protocols can yield large-scale, high-quality datasets.
- Current 3D pose models struggle with depth and foot contact.
Method
A simple, standardized protocol enables data collection without specialized equipment or expertise, featuring fully automated video calibration and synchronization.
In practice
- Benchmark monocular-to-3D pose algorithms on CalTennis.
- Focus model improvements on depth and foot contact.
- Apply footwork and stability metrics for action analysis.
Topics
- CalTennis
- 3D Pose Estimation
- Multi-View Video
- Human Motion Capture
- Video Datasets
- Depth Estimation
- Action Analysis
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.