JETO-Bench: A Reproducible Benchmark for Execution Time Improvement Patches in Java

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

JETO-Mine is introduced as the first configurable and reusable tool for creating reproducible benchmarks of execution time improvement patches (ETIPs) in real-world Java projects. It employs a three-phase pipeline: static analysis to identify ETIPs from GitHub repositories using user-defined filters and an LLM-based classifier, dynamic analysis to wrap ETIPs in Docker images for reproducible execution and statistical testing, and an evaluation harness for quantitative assessment. Using JETO-Mine, JETO-Bench was built, comprising 660 identified ETIPs and 91 manually verified executable ETIPs from 174 open-source Java repositories, scanning 11 years and nearly 1.8 million commits. An evaluation of OpenHands on JETO-Bench showed it correctly fixed 14.3% (13/91) of issues, aligning with results from other languages. The study also highlights a significant lack of tests demonstrating execution time improvements in open-source Java projects.

Key takeaway

For AI Engineers or Research Scientists developing automated program repair tools for Java, JETO-Bench provides a robust, reproducible environment for evaluating execution time improvement patches. You should leverage JETO-Mine for continuous benchmark collection and its evaluation harness to gain precise, execution-based feedback on generated patches. This will help you address the current limitations of coding agents and the scarcity of performance-specific tests in Java projects.

Key insights

JETO-Mine creates reproducible Java performance benchmarks, revealing agent limitations and testing gaps.

Principles

Method

JETO-Mine's pipeline includes static analysis (GitHub crawl, LLM filter), dynamic analysis (Docker containerization, statistical testing), and an evaluation harness for patches and tests.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.