SAGE: Semantic-Aware Gray-Box Game Regression Testing with Large Language Models

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Gaming & Interactive Media · Depth: Expert, extended

Summary

SAGE is a semantic-aware gray-box regression testing framework designed for modern live-service games, addressing limitations in manual test case construction, suite maintenance, and test prioritization. It employs LLM-guided reinforcement learning for efficient, goal-oriented exploration to automatically generate diverse foundational test suites. Subsequently, SAGE uses semantic-based multi-objective optimization to refine this suite into a compact, high-value subset by balancing cost, coverage, and rarity. Finally, it utilizes LLM-based semantic analysis of update logs to prioritize test cases relevant to version changes. Evaluated on Overcooked Plus and Minecraft, SAGE achieved superior bug detection with significantly lower execution cost, detecting approximately 1.6x more unique bugs than automated baselines and reducing execution time by 75-90%. It uses GPT-4o for LLM tasks.

Key takeaway

For game QA teams managing live-service titles, SAGE offers a robust solution to automate regression testing in gray-box environments. You can significantly reduce manual effort and execution costs by adopting its LLM-guided test generation and semantic-aware prioritization. Consider implementing multi-objective optimization to maintain a compact, high-value test suite that adapts efficiently to frequent game updates, ensuring critical bug detection without excessive overhead.

Key insights

SAGE uses LLMs to orchestrate semantic-aware gray-box game regression testing, improving bug detection and efficiency.

Principles

Method

SAGE generates seed trajectories with LLMs, trains an RL agent for guided exploration, constructs a state-action graph, optimizes paths via multi-objective selection, and prioritizes tests using LLM-extracted update log semantics.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.