Learning Lifted Action Models from Unsupervised Visual Traces

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Computer Vision & Image Processing · Depth: Expert, extended

Summary

This research introduces a deep learning framework for learning lifted action models from sequences of state images, without requiring action observations. The framework jointly learns state prediction, action prediction, and a lifted action model. To address issues like prediction collapse and self-reinforcing errors, the authors integrate a mixed-integer linear program (MILP) as an external logical consistency checker. The MILP corrects a subset of predicted traces and the action model to align with logical planning constraints, generating pseudo-labels that guide further neural network training. Experiments across five domains (Blocksworld, Gripper, Logistics, Hanoi, 8-puzzle) using two visual representations (MNIST/EMNIST grids and PDDLGym synthesized images) demonstrate that MILP-based correction significantly improves convergence and logical consistency, often recovering ground-truth action models without error. The approach uses PyTorch 2.6 and Gurobi 12.0.1, training for 5,000 epochs with Adam optimizer.

Key takeaway

For research scientists developing AI planning agents, this work demonstrates a robust method for learning interpretable action models from raw visual data. You should consider integrating a mixed-integer linear program (MILP) as a logical consistency module within your deep learning frameworks to prevent prediction collapse and escape local optima, especially when action supervision is unavailable. This approach can significantly improve the accuracy and consistency of learned models.

Key insights

A neuro-symbolic framework learns lifted action models from visual traces without action supervision, using MILP for logical consistency.

Principles

Method

A deep learning framework jointly predicts states, actions, and action models. A MILP periodically corrects predictions on a subset of traces, generating pseudo-labels to supervise the neural model's continued training, with exponential decay for older labels.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.