SAM 3.1: Faster and More Accessible Real-Time Video Detection and Tracking With Multiplexing and Global Reasoning - AI at Meta

· Source: ai.meta.com via Google News · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Data Science & Analytics · Depth: Advanced, long

Summary

Meta has released SAM 3.1, an update to its Segment Anything Model 3, which significantly boosts video processing efficiency. SAM 3.1 introduces object multiplexing, allowing it to track up to 16 objects in a single forward pass, doubling throughput from 16 to 32 frames per second on an H100 GPU for videos with a medium number of objects. This enhancement enables real-time object tracking in complex videos and reduces GPU resource requirements. SAM 3, the foundational model, unifies detection, segmentation, and tracking using text, exemplar, and visual prompts, overcoming limitations of fixed label sets. It achieves a 2x gain over existing systems on the SA-Co benchmark and runs in 30 milliseconds for a single image with over 100 detected objects on an H200 GPU. Meta also introduced the Segment Anything Playground for experimentation and SAM 3D for 3D object and human reconstruction.

Key takeaway

For computer vision engineers developing real-time video analysis or media editing applications, SAM 3.1 offers a significant performance upgrade. Its object multiplexing capability doubles processing speed, making complex, multi-object tracking feasible on more accessible hardware. You should consider integrating SAM 3.1 to enhance throughput and reduce computational costs, especially for applications requiring high frame rates or handling numerous objects simultaneously. Explore the Segment Anything Playground to quickly prototype new creative effects.

Key insights

SAM 3.1 enhances real-time video object tracking via multiplexing, doubling processing speed and reducing GPU demands.

Principles

Method

SAM 3.1 employs object multiplexing for parallel processing of up to 16 objects in a single forward pass. SAM 3 uses a data engine combining SAM 3, Llama-based captioners, and human/AI annotators for scalable dataset creation.

In practice

Topics

Code references

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by ai.meta.com via Google News.