GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents

2026-06-18 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Data Science & Analytics · Depth: Expert, extended

Summary

GrowthHacker is a benchmark system designed to automate the optimization of Off-Policy Evaluation (OPE) using code-modifying LLM agents. OPE, also known as offline A/B testing, is critical for data-driven software development where online experiments are resource-intensive or risky, such as in healthcare and recommender systems. The system autonomously and iteratively refines code, applies changes, and evaluates performance. A comprehensive study involving 504 experimental runs on the Open Bandit Pipeline (OBP) and Scope-RL datasets benchmarked GrowthHacker's "two_agent" framework against AutoGen, CrewAI, and a default LLM. The "two_agent" framework achieved 100% reliability and a 106.7% average improvement among positive outcomes, with a 45% positive outcome rate. CrewAI also showed a 45% positive outcome rate but a lower average improvement of 31.7%, while AutoGen achieved 34% improvement. These results demonstrate the feasibility of LLM-based agents as automated "growth hackers" for OPE systems.

Key takeaway

For MLOps Engineers managing data-driven experimentation, you should consider integrating LLM-based agents for automated Off-Policy Evaluation (OPE) optimization. The "two_agent" framework demonstrated 100% reliability and 106.7% average improvement, outperforming other multi-agent systems. This approach can reduce manual effort and accelerate data-driven decision-making, but be mindful of framework-specific failure patterns and library sensitivities, especially in continuous action spaces.

Key insights

LLM-based agents can autonomously optimize Off-Policy Evaluation code, significantly improving performance and reliability in data-driven systems.

Principles

Specialized agent architectures enhance reliability.
Iterative code optimization avoids context degradation.
Framework-method pairings are critical for success.

Method

GrowthHacker uses a two-agent framework (Analyzer and Coder) for iterative code optimization. The Analyzer identifies changes, and the Coder implements them, with file-based communication and post-hoc selection of the best iteration.

In practice

Implement a two-agent architecture for code optimization.
Prioritize agent-applied partial code modifications.
Account for library-specific parameter sensitivities.

Topics

Off-Policy Evaluation
LLM Agents
Code Optimization
A/B Testing
Reinforcement Learning
Growth Hacking

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.