Converted, Not Equivalent: Benchmarking Codebase Conversion via Observational Equivalence

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

T2J-Bench is a new benchmark for codebase conversion, specifically from PyTorch training codebases to JAX, that reformulates conversion as "transfer under a fixed equivalence contract." Developed by researchers from USC and Google Cloud AI Research, it evaluates conversions through three ordered stages: Spec (interface admissibility), Numeric (forward outputs, losses, gradients), and Behavioral (short training dynamics). Across 355 blind conversion attempts, the best system achieved only a 26.7–28.9% overall pass rate, despite Spec pass rates up to 91.1%. Findings indicate that a 4.7x token-budget spread yielded only a 2.2x pass-rate spread, and systems systematically overestimated success by 66.6–97.8 percentage points, suggesting failures stem from contract-misaligned self-validation rather than limited budget or backbone strength.

Key takeaway

For AI Scientists and ML Engineers developing coding agents for codebase conversion, you must prioritize aligning agent self-validation with external semantic contracts. Your current local checks likely overstate success, as demonstrated by the 66.6–97.8 percentage point gap observed in T2J-Bench. Implement robust source-comparison loops and explicit seam discipline for cross-paradigm conversions to ensure true behavioral equivalence, not just structural plausibility.

Key insights

Coding agents over-trust local validation, failing semantic contracts in codebase conversion despite surface checks.

Principles

Method

T2J-Bench verifies codebase conversion via three stages: Spec (interface), Numeric (outputs, gradients), and Behavioral (training dynamics) against a fixed equivalence contract derived from the source.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.