VeriGraph: Scene Graphs for Execution Verifiable Robot Planning

2026-04-21 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, long

Summary

VeriGraph is a novel framework designed to enhance robot task planning by integrating Vision-Language Models (VLMs) with an action verification mechanism. It addresses the challenge of VLMs generating incorrect action sequences by using scene graphs as an intermediate representation. The system generates a scene graph from input images, capturing objects and their spatial relationships. This scene graph is then used to iteratively check and correct action sequences proposed by an LLM-based task planner, ensuring physical constraints are met and actions are executable. VeriGraph significantly improves task completion rates, outperforming baseline methods by 58% for language-based tasks and 30% for image-based tasks across diverse manipulation scenarios, including kitchen, tabletop, and block scenes.

Key takeaway

For research scientists developing robot manipulation systems, VeriGraph demonstrates a robust approach to mitigate VLM planning failures. You should consider integrating scene graph representations and iterative verification mechanisms into your planning frameworks. This method allows for more reliable execution of complex tasks by ensuring physical constraints are respected, leading to higher task completion rates and reduced need for human intervention.

Key insights

VeriGraph uses scene graphs and iterative verification to improve VLM-based robot planning accuracy and executability.

Principles

Scene graphs abstract object details, reducing noise.
Iterative planning with feedback refines action sequences.
Actions can be modeled as graph operations for verification.

Method

VeriGraph generates initial and goal scene graphs from images/language, then an iterative planner proposes actions. These actions are validated against scene graph constraints; if invalid, feedback is provided to the planner for correction until the goal state is achieved.

In practice

Use scene graphs for robust object relationship representation.
Implement iterative feedback loops for VLM plan correction.
Represent actions as graph manipulations for efficient constraint checking.

Topics

VeriGraph Framework
Scene Graphs
Robot Task Planning
Execution Verifiability
Vision-Language Models

Best for: Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.