SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

SimGym is a framework developed by Shopify for simulating A/B tests on e-commerce storefronts using vision-language model (VLM) agents. It addresses the limitations of traditional A/B testing, which diverts traffic, takes weeks for statistical significance, and risks degrading user experience. The framework integrates a traffic-grounded persona generation pipeline. It also includes a live-browser agent architecture, complete with multimodal perception, episodic memory, and guardrails. An evaluation protocol completes the system. Validated on 50 real-world UI theme changes across diverse storefronts, SimGym agents achieved 77% directional alignment with observed add-to-cart shifts in real buyer traffic. This reduces experimental cycles from weeks to under an hour, enabling rapid, risk-free experimentation.

Key takeaway

For e-commerce product managers or ML engineers evaluating storefront UI changes, SimGym offers a critical tool to de-risk and accelerate experimentation. You can rapidly pre-test visually driven theme changes using synthetic VLM agents. These agents achieve 77% directional alignment with real user behavior in under an hour. This allows you to screen suboptimal designs and prioritize high-potential variants for live A/B tests. It significantly reduces costs and improves innovation cycles without impacting real customer experience.

Key insights

SimGym enables rapid, risk-free A/B testing of e-commerce UI changes using VLM agents grounded in real traffic data.

Principles

Method

SimGym's method involves generating synthetic buyer personas from clickstream data, deploying VLM agents in a live browser with multimodal perception and memory, and evaluating simulated outcomes against real A/B test shifts.

In practice

Topics

Code references

Best for: Research Scientist, AI Product Manager, Product Manager, AI Scientist, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.