Identifying the Unknown: Prompt-Free Open Vocabulary Anomaly Recognition for Robot-Object Interaction

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Robotics & Autonomous Systems, Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

AnomNOVIC is a novel two-stage framework designed for prompt-free open vocabulary anomaly recognition in robot-object interaction, addressing the need for robots to identify previously unseen objects in open-world environments. This known-workspace system integrates a masked autoencoder (MAE), which generates generic object-agnostic bounding boxes for anomaly detection, with NOVIC, a real-time, prompt-free open vocabulary image classifier. NOVIC then classifies these salient image regions without requiring a predefined candidate class list. Evaluated in a tabletop robot-object environment using the NICOL humanoid robot, AnomNOVIC achieved 47.1% AP and 57.5% AP50 for prompt-free recognition. When class candidates were provided, performance rose to 59.0% AP and 72.5% AP50. Across an additional in-the-wild test set featuring 48 unique objects, it reached up to 82.6% prompt-free detection and classification accuracy, significantly outperforming baselines like YOLO-World-v2, OWLv2, and YOLOE.

Key takeaway

For Robotics Engineers developing systems for open-world autonomy, AnomNOVIC offers a significant advancement in recognizing previously unseen objects without requiring explicit prompts. Your current object detection pipelines, especially those relying on predefined class lists or manual prompting, may be significantly less efficient and robust in dynamic environments. Consider evaluating AnomNOVIC's two-stage approach to enhance your robot's ability to interact with novel items and detect anomalies, potentially streamlining deployment in complex, unstructured settings.

Key insights

AnomNOVIC enables robots to recognize unknown objects efficiently and without prompts by combining a MAE with a real-time open vocabulary classifier.

Principles

Method

AnomNOVIC uses a MAE for generic object-agnostic bounding box generation, followed by NOVIC, a real-time prompt-free open vocabulary classifier, to identify salient image regions without predefined class lists.

In practice

Topics

Best for: Research Scientist, Robotics Engineer, AI Scientist, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.