Binary Decompilation LLM with Feedback-Driven Multi-Turn Refinement
Summary
AutoDecompiler is a reinforcement-learning-trained large language model designed for feedback-driven multi-turn binary decompilation. Unlike traditional single-turn LLM methods that generate code once, AutoDecompiler iteratively refines decompiled code using compilation, execution, and input/output testing feedback. Its training incorporates decompilation-specific rewards for code validity, recompilability, execution consistency, and semantic fidelity. It also utilizes stage-aware diagnostic feedback, progress-aware trajectory rewarding, and turn-aware advantage reweighting to guide beneficial revisions. Evaluated across HumanEval and ExeBench benchmarks, AutoDecompiler consistently outperforms single-turn LLM counterparts of the same size and input setting, demonstrating significant improvements in behavioral re-executability. For instance, AutoDecompiler-E2E 1.3B improved the best prior Re-exe result by 5.19%, and the 30B variant improved Re-exe over DeepSeek-V3 by 16.16%. The model family, including 1.3B, 6.7B, and 30B parameter scales, was trained on a comparatively smaller dataset of 0.31 million samples.
Key takeaway
For AI Engineers or ML Scientists developing code generation models for security tasks, you should move beyond single-turn generation and integrate iterative, feedback-driven refinement. This approach significantly enhances functional correctness and re-executability, even with smaller datasets. Consider implementing multi-dimensional rewards and stage-aware diagnostic feedback to improve model reliability and reduce hallucinations in critical applications like vulnerability discovery and malware inspection.
Key insights
Binary decompilation improves significantly by treating it as an iterative, feedback-driven refinement process using reinforcement learning.
Principles
- Decompilation quality requires multi-dimensional rewards.
- Iterative refinement needs explicit progress signals.
- Heterogeneous feedback must be actionable guidance.
Method
AutoDecompiler fine-tunes a base LLM with SFT, then optimizes it via RL for multi-turn refinement. It validates generated code, computes scalar rewards, and constructs natural-language diagnostic feedback for subsequent turns.
In practice
- Implement feedback loops for LLM-generated code.
- Design multi-dimensional rewards for complex tasks.
- Utilize RL to learn iterative code refinement.
Topics
- Binary Decompilation
- Reinforcement Learning
- Large Language Models
- Code Refinement
- Software Security
- Functional Correctness
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.