The Sim-to-Real Gap of Foundation Model Agents: A Unified MDP Perspective

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Advanced, extended

Summary

Foundation model agents are increasingly deployed for real-world decision-making, but suffer from the sim-to-real gap. This 2026 paper proposes formalizing this evaluation and training gap as a classical sim-to-real problem, structured around the four elements of a Markov Decision Process (Observation, Action, Transition, Reward). It argues against treating agent robustness as a novel phenomenon, advocating for adopting established solutions like domain randomization from robotics and classical control. The research agenda translates classical discrepancies into the foundation model domain, providing concrete examples such as multilingual tool calling. For instance, GPT5 and Qwen-Next-80B showed error rate increases from 13.5% to 28.5% and 5.5% to 46.5% respectively when instructions transferred from English to Chinese due to parameter value language mismatch. The ultimate goal is a unified vocabulary and standardized stress test benchmarks for highly trustworthy agents.

Key takeaway

For machine learning engineers deploying foundation model agents, you must proactively address the sim-to-real gap by adopting established MDP-based frameworks. Your evaluation should systematically stress-test agents across observation, action, transition, and reward discrepancies. This approach, including techniques like domain randomization, will prevent critical real-world failures and ensure your agents are robust and trustworthy in production.

Key insights

The sim-to-real gap in foundation model agents can be effectively addressed by applying established Markov Decision Process frameworks from classical control and robotics.

Principles

Method

The proposed method involves decomposing FM agent evaluation and training gaps into MDP elements (Observation, Action, Transition, Reward) and applying classical mitigation techniques, such as domain randomization and grounded action transformation, to each.

In practice

Topics

Best for: Research Scientist, NLP Engineer, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.