Not All Inputs Are Valid: Towards Open-Set Video Moment Retrieval Using Language

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

Open-Set Video Moment Retrieval (OS-VMR) is a new task addressing the limitation of traditional Video Moment Retrieval (VMR) systems, which implicitly assume all queries are video-relevant. This closed-set assumption leads to incorrect retrievals for out-of-distribution (OOD) queries, posing risks in high-stakes applications like criminal activity detection. Researchers propose OpenVMR, a novel model that first distinguishes in-distribution (ID) from OOD queries using normalizing flow technology, then performs moment retrieval for ID queries. OpenVMR learns the ID distribution, introduces an uncertainty score to define an ID-OOD boundary, and refines this boundary by pulling ID query features together. It also incorporates video-query and frame-query matching for cross-modal interaction and a positive-unlabeled learning module for retrieval. Experimental results on three VMR datasets demonstrate OpenVMR's effectiveness.

Key takeaway

For Machine Learning Engineers developing video analysis systems, especially in high-risk domains, you must account for out-of-distribution queries. Traditional Video Moment Retrieval models can yield dangerous false positives when presented with irrelevant inputs. Implementing an Open-Set VMR approach, like OpenVMR's method of distinguishing and rejecting OOD queries, is crucial to ensure system reliability and prevent irrecoverable losses in critical applications.

Key insights

Open-Set Video Moment Retrieval rejects irrelevant queries while precisely retrieving relevant video moments.

Principles

Method

OpenVMR distinguishes ID/OOD queries using normalizing flow to model ID distribution, then an uncertainty score defines a separating boundary, refined by pulling ID features. Coarse/fine-grained cross-modal matching and positive-unlabeled learning follow for retrieval.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.