Multi-Objective Constraint Inference using Inverse reinforcement learning

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Multi-Objective Constraint Inference (MOCI) is a novel framework designed to extract shared constraints and individual preferences from heterogeneous expert demonstrations in reinforcement learning. Unlike traditional methods that assume homogeneous expert behavior and known reward functions, MOCI can learn from diverse, potentially conflicting, trajectories. The framework iteratively clusters expert trajectories, learns personalized reward weights using Maximum Entropy Inverse Reinforcement Learning, and identifies shared hard constraints by evaluating state log-likelihoods. Empirical evaluations on a multi-objective Gridworld benchmark demonstrate that MOCI achieves superior predictive accuracy with a Mean Squared Error (MSE) of 0.027, outperforming baselines like Maximum Likelihood Constraint Inference (MLCI) and Inverse Constrained Reinforcement Learning (ICRL). MOCI also maintains competitive computational efficiency, scaling as $\mathcal{O}(|\mathcal{S}|\cdot|\mathcal{A}|\cdot H)$, and effectively recovers ground-truth constraints and distinct expert preferences.

Key takeaway

For research scientists developing safe reinforcement learning agents in complex, multi-expert environments, MOCI offers a robust solution for inferring both shared constraints and individual preferences from heterogeneous demonstration data. You should consider MOCI to improve predictive accuracy and handle diverse expert behaviors, especially when traditional methods fall short due to assumptions of homogeneity. Be mindful of setting the hyperparameter $K$ (number of expert types) and the $d_{DKL}$ threshold for optimal performance.

Key insights

MOCI infers shared constraints and individual preferences from diverse expert demonstrations using an EM-based approach.

Principles

Method

MOCI uses an Expectation-Maximization approach to iteratively cluster trajectories, update personalized reward weights via gradient ascent, and greedily add hard constraints based on log-likelihood improvement.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.