OASIF: An Efficient Obfuscation-Aware Self-Improving Framework for LLM-Based Assembly Code Instruction Following and Comprehension

2019-04-12 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, long

Summary

OASIF is an Obfuscation-Aware Self-evolving Instruction-Following framework designed to enhance Large Language Model (LLM) comprehension of obfuscated assembly code. It addresses challenges like extensive code length and costly supervision by integrating a token-efficient assembly encoder with a lightweight projector, enabling pretrained code LLMs to process long obfuscated sequences within bounded context. The framework employs a three-phase training regimen: feature-space alignment, supervised instruction fine-tuning, and online self-evolving reinforcement learning with hybrid rewards for continuous adaptation. On the VMISA-Bench, OASIF significantly improved Qwen2.5-Coder-Instruct-14B, yielding Success Rate gains of +15.9 pp on Code Virtualizer, +5.8 pp on Themida (v3.0.7), and +16.9 pp on VMProtect (v3.5), with an overall OASIF-Bench average improvement of +9.8. It also maintains performance on general and domain-relevant benchmarks.

Key takeaway

For AI Security Engineers and Reverse Engineers analyzing commercial-grade obfuscated binaries, traditional LLM approaches often fail due to context limitations and heavy obfuscation. You should consider OASIF's three-phase training, especially its online self-evolving reinforcement learning, to significantly improve LLM-based assembly comprehension. Implementing token-efficient assembly encoding and hybrid reward-based self-improvement can yield substantial gains, as demonstrated by +9.8 pp average success rate on challenging VM-based obfuscators.

Key insights

OASIF enables LLMs to comprehend heavily obfuscated assembly code through a multi-phase, self-evolving reinforcement learning framework.

Principles

Token-efficient encoding is crucial for long obfuscated code.
Self-evolving RL enables continuous adaptation with minimal manual verification.
Hybrid structural and semantic rewards drive self-improvement.

Method

OASIF trains in three phases: feature-space alignment, supervised instruction fine-tuning, and online self-evolving reinforcement learning with hybrid structural and semantic rewards for continuous adaptation.

In practice

Use a dedicated assembly encoder for compact representation.
Introduce special tokens for structured LLM input.
Generate synthetic data for instruction-centric supervision.

Topics

Assembly Code Analysis
Code Obfuscation
Large Language Models
Reinforcement Learning
Binary Analysis
VMISA-Bench

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.