Driving Video Retrieval for Complex Queries with Structured Grounding
Summary
STRIVE-D is a new data-calibrated retrieval framework designed to improve driving video retrieval for complex dynamic events in autonomous driving. Existing vision-language, keyword-based, and rule-based methods often fail to accurately identify events like cut-ins or hard braking due to limitations in text descriptions or rule brittleness. STRIVE-D addresses this by utilizing weakly labeled in-domain videos to estimate rule reliability, adapt rules to observed data, and integrate calibrated rule scores with vision-language and keyword-based retrieval signals. This approach significantly enhances the ability to find specific dynamic events crucial for data curation and safety validation. Across three driving benchmarks, including newly released human-annotated event data on DrivingDojo, STRIVE-D delivers up to 84% relative improvement in top-1 accuracy over current methods.
Key takeaway
For autonomous driving engineers and data curation specialists focused on safety validation, STRIVE-D offers a significant advancement in retrieving complex dynamic events. If your current vision-language or rule-based systems struggle with identifying nuanced scenarios like cut-ins, you should investigate integrating data-calibrated rule fusion. This approach promises up to 84% higher top-1 accuracy, enabling more precise data selection and robust system testing for critical driving behaviors.
Key insights
STRIVE-D improves driving video retrieval for complex events by calibrating rule-based methods with weakly labeled data and fusing signals.
Principles
- Rule-based retrieval needs data calibration for robustness.
- Fusing calibrated rules with vision-language improves accuracy.
- Weakly labeled in-domain videos enhance rule reliability.
Method
STRIVE-D estimates query rule reliability using weakly labeled in-domain videos, adapts rules to observed data, and fuses these calibrated rule scores with vision-language and keyword-based retrieval signals.
In practice
- Use STRIVE-D for autonomous driving safety validation.
- Apply data calibration to brittle rule-based systems.
- Integrate diverse retrieval signals for complex queries.
Topics
- Driving Video Retrieval
- Autonomous Driving
- Vision-Language Models
- Rule-Based Systems
- Data Calibration
- DrivingDojo Benchmark
- Complex Event Detection
Best for: Research Scientist, AI Scientist, Computer Vision Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.