Multi-Task Tennis Stroke Biomechanics Analysis Using MediaPipe Pose

2026-06-14 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition · Depth: Expert, quick

Summary

A multi-task pipeline for tennis stroke biomechanics analysis has been developed, utilizing plain RGB video and MediaPipe Pose Landmarker's 33 metric world coordinates. This system automatically identifies strokes using a weighted joint velocity score (s(t) = 0.5 v_wrist + 0.3 m_elbow + 0.2 m_shoulder) and performs stroke recognition, shot direction prediction, and posture quality grading, complemented by a rule-based feedback layer. The core is TennisTransformerGPU, a 564,103-parameter transformer (4 layers, 4 heads, d=128) with three output heads, processing 30-frame by 39-feature sequences. Trained on 1,281 strokes from 7 pros and 1 amateur, it achieved 83.7% stroke-type accuracy, 61.9% on direction, and 62.6% on posture. A cross-player evaluation showed stroke-type accuracy remained high at 82.9%, but direction prediction failed to transfer. Crucially, an ablation study revealed that using world coordinates is vital, as image-space landmarks significantly reduced accuracy. The system is fully reproducible on Kaggle's free T4 GPU tier.

Key takeaway

For sports biomechanics researchers developing automated coaching tools, this work highlights the critical importance of using metric world coordinates from pose estimation, like MediaPipe Pose. Relying on image-space landmarks will severely degrade cross-player transferability and overall accuracy. You should prioritize robust 3D pose data and consider compact transformer architectures for multi-task analysis to ensure your systems generalize effectively across different athletes.

Key insights

A multi-task transformer pipeline analyzes tennis biomechanics from RGB video, leveraging MediaPipe's world coordinates for robust stroke recognition and posture grading.

Principles

Metric world coordinates are critical for cross-player pose transferability.
Automated stroke detection can use weighted joint velocity scores.
Compact transformers can handle multi-task pose analysis effectively.

Method

The pipeline automatically finds strokes via a weighted joint velocity score, then feeds 30-frame, 39-feature sequences from MediaPipe Pose (world coordinates) into a 564,103-parameter TennisTransformerGPU with three parallel output heads.

In practice

Use MediaPipe Pose Landmarker for robust 3D pose estimation in sports.
Implement weighted joint velocity for automatic event detection in video.
Consider compact transformers for multi-task biomechanical analysis.

Topics

Tennis Biomechanics
Multi-Task Learning
MediaPipe Pose
Transformer Networks
Pose Estimation
Sports Analytics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.