AdaTrans: Automated C to Rust Transformation via Error-Adaptive Repair

2024-07-18 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

The AdaTrans framework, introduced in 2026, automates C-to-Rust code transformation by addressing Rust's strict ownership and borrowing semantics. It integrates a Strategy-Driven Retrieval-Augmented Generation (RAG) mechanism to map compiler errors to specific repair strategies, an Error-Stratified Transformation Strategy (ESTS) that classifies compiler diagnostics for adaptive temperature scheduling, and a multi-stage validation pipeline for compilability and functional equivalence. Evaluated on a dataset of 104 algorithmic problems from LeetCode, AdaTrans achieved a mean compilation pass rate of 95.51% (± 1.11%) and a mean solve rate of 81.09% (± 3.09%) with a low unsafe file rate of 1.19%. This significantly improves upon existing LLM-based tools and zero-shot baselines using gpt-4o-mini, demonstrating that error-adaptive repair reconciles transformation correctness with memory safety.

Key takeaway

For AI Engineers tasked with migrating C codebases to Rust, AdaTrans demonstrates a critical path to achieving both functional equivalence and memory safety. You should consider implementing error-adaptive repair loops that leverage compiler diagnostics and dynamic temperature scheduling. This approach, which significantly outperforms brute-force sampling, allows you to systematically address Rust's strict ownership rules and reduce reliance on unsafe blocks, ensuring robust and secure code transformations.

Key insights

Adapting LLM repair strategies to compiler error categories significantly improves C-to-Rust transformation correctness and memory safety.

Principles

LLMs struggle with deterministic static constraints like Rust's ownership system.
Compiler feedback is often underutilized for targeted program repair.
Uniform repair strategies are ineffective across heterogeneous error types.

Method

AdaTrans uses a generate-verify-repair loop, classifying compiler diagnostics (SL, MS, LB, AF), retrieving error-specific repair templates via RAG, and adapting LLM generation temperature based on error category and stagnation.

In practice

Categorize compiler errors to tailor LLM repair strategies effectively.
Integrate RAG with LLMs for context-aware code transformation guidance.
Implement dynamic temperature scheduling for varied error types (e.g., low for syntax, high for logic).

Topics

C-to-Rust Transformation
Large Language Models
Automated Program Repair
Memory Safety
Retrieval-Augmented Generation
Error-Stratified Repair

Code references

Best for: Research Scientist, AI Scientist, AI Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.