Binary Decompilation LLM with Feedback-Driven Multi-Turn Refinement

2025-01-14 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Cybersecurity & Data Privacy · Depth: Expert, extended

Summary

AutoDecompiler is a reinforcement-learning-trained large language model designed for feedback-driven multi-turn binary decompilation. Unlike traditional single-turn LLM methods that generate code once, AutoDecompiler iteratively refines decompiled code using compilation, execution, and input/output testing feedback. Its training incorporates decompilation-specific rewards for code validity, recompilability, execution consistency, and semantic fidelity. It also utilizes stage-aware diagnostic feedback, progress-aware trajectory rewarding, and turn-aware advantage reweighting to guide beneficial revisions. Evaluated across HumanEval and ExeBench benchmarks, AutoDecompiler consistently outperforms single-turn LLM counterparts of the same size and input setting, demonstrating significant improvements in behavioral re-executability. For instance, AutoDecompiler-E2E 1.3B improved the best prior Re-exe result by 5.19%, and the 30B variant improved Re-exe over DeepSeek-V3 by 16.16%. The model family, including 1.3B, 6.7B, and 30B parameter scales, was trained on a comparatively smaller dataset of 0.31 million samples.

Key takeaway

For AI Engineers or ML Scientists developing code generation models for security tasks, you should move beyond single-turn generation and integrate iterative, feedback-driven refinement. This approach significantly enhances functional correctness and re-executability, even with smaller datasets. Consider implementing multi-dimensional rewards and stage-aware diagnostic feedback to improve model reliability and reduce hallucinations in critical applications like vulnerability discovery and malware inspection.

Key insights

Binary decompilation improves significantly by treating it as an iterative, feedback-driven refinement process using reinforcement learning.

Principles

Decompilation quality requires multi-dimensional rewards.
Iterative refinement needs explicit progress signals.
Heterogeneous feedback must be actionable guidance.

Method

AutoDecompiler fine-tunes a base LLM with SFT, then optimizes it via RL for multi-turn refinement. It validates generated code, computes scalar rewards, and constructs natural-language diagnostic feedback for subsequent turns.

In practice

Implement feedback loops for LLM-generated code.
Design multi-dimensional rewards for complex tasks.
Utilize RL to learn iterative code refinement.

Topics

Binary Decompilation
Reinforcement Learning
Large Language Models
Code Refinement
Software Security
Functional Correctness

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.