Driving Video Retrieval for Complex Queries with Structured Grounding

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

STRIVE-D is a new data-calibrated retrieval framework designed to improve driving video retrieval for complex dynamic events in autonomous driving. Existing vision-language, keyword-based, and rule-based methods often fail to accurately identify events like cut-ins or hard braking due to limitations in text descriptions or rule brittleness. STRIVE-D addresses this by utilizing weakly labeled in-domain videos to estimate rule reliability, adapt rules to observed data, and integrate calibrated rule scores with vision-language and keyword-based retrieval signals. This approach significantly enhances the ability to find specific dynamic events crucial for data curation and safety validation. Across three driving benchmarks, including newly released human-annotated event data on DrivingDojo, STRIVE-D delivers up to 84% relative improvement in top-1 accuracy over current methods.

Key takeaway

For autonomous driving engineers and data curation specialists focused on safety validation, STRIVE-D offers a significant advancement in retrieving complex dynamic events. If your current vision-language or rule-based systems struggle with identifying nuanced scenarios like cut-ins, you should investigate integrating data-calibrated rule fusion. This approach promises up to 84% higher top-1 accuracy, enabling more precise data selection and robust system testing for critical driving behaviors.

Key insights

STRIVE-D improves driving video retrieval for complex events by calibrating rule-based methods with weakly labeled data and fusing signals.

Principles

Method

STRIVE-D estimates query rule reliability using weakly labeled in-domain videos, adapts rules to observed data, and fuses these calibrated rule scores with vision-language and keyword-based retrieval signals.

In practice

Topics

Best for: Research Scientist, AI Scientist, Computer Vision Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.