ParaScale: Scale-Calibrated Camera-Motion Transfer via a Gauge-Invariant Parallax Number

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

ParaScale introduces a novel approach to camera-motion transfer, enabling creators to reuse cinematic moves across scenes with incompatible scales. It addresses the issue where naive trajectory reuse results in imperceptible or violently exaggerated motion, attributing this to the geometric fact that translation-induced image motion scales as ||T||/Z, making monocular trajectories meaningful only up to a depth-scale gauge. The core innovation is the Parallax Number Pi = ||Delta T|| / Zbar, a dimensionless, gauge-invariant descriptor proven essential for scale-faithful transfer. ParaScale functions as a plug-and-play module, reading Pi from a reference video and re-realizing it against the target scene's depth per frame, without altering rotation or requiring retraining. It integrates between pose extraction and injection. The system also introduces the Parallax Consistency Error (PCE), a scale-symmetric metric that effectively exposes scene-scale mismatch. ParaScale reduces PCE by over 3x compared to uncalibrated transfer across four orders of magnitude, maintaining visual fidelity.

Key takeaway

For computer vision engineers developing video generation tools, if you are transferring camera motion between scenes of vastly different scales, implement ParaScale. This module ensures cinematic moves are felt appropriately by preserving the gauge-invariant Parallax Number Pi, preventing imperceptible or violently exaggerated motion. It integrates seamlessly into existing pose-conditioned generators, cutting Parallax Consistency Error by over 3x without retraining.

Key insights

Scale-faithful camera-motion transfer requires preserving the gauge-invariant Parallax Number Pi, not raw trajectories.

Principles

Method

ParaScale reads the Parallax Number Pi from a reference video and re-realizes it against the target scene's depth, per frame, leaving rotation untouched, between pose extraction and injection.

In practice

Topics

Best for: Research Scientist, AI Scientist, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.