Multi-View Decompilation for LLM-Based Malware Classification

2026-06-18 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Advanced, quick

Summary

A new approach, Multi-View Decompilation, addresses the fragility of existing LLM-based malware classification pipelines that rely on a single decompiler view. Recognizing that decompilers are lossy heuristic tools and produce different artifacts, researchers curated a benchmark of benign and malicious programs. Each sample was compiled and then decompiled using both Ghidra and RetDec, yielding matched pseudo-C views. Across various LLMs, providing both decompiler views significantly improved malicious-class F1 scores, primarily by increasing recall on malicious samples. Agreement analyses confirmed that Ghidra and RetDec make partially different errors, supporting the idea that their outputs offer complementary evidence. This multi-decompiler prompting method is a simple, training-free way to enhance LLM-based malware triage.

Key takeaway

For AI Security Engineers developing LLM-based malware classification systems, you should integrate multi-decompiler prompting. Relying on a single decompiler view is fragile; providing both Ghidra and RetDec outputs to your LLMs can significantly boost malicious sample recall and overall F1 scores. This simple, training-free approach offers a practical way to enhance detection accuracy in real-world triage.

Key insights

Using multiple decompiler views improves LLM-based malware classification by providing complementary evidence.

Principles

Decompilers are lossy heuristic tools.
Different decompilers expose complementary artifacts.
Multi-decompiler prompting boosts LLM recall.

Method

The proposed method involves feeding LLMs pseudo-C code from multiple decompilers, such as Ghidra and RetDec, for the same binary to improve classification.

In practice

Integrate Ghidra and RetDec outputs.
Apply multi-decompiler prompting.
Enhance LLM-based malware triage.

Topics

Malware Classification
Large Language Models
Decompilation
Ghidra
RetDec
Cybersecurity

Best for: AI Engineer, Machine Learning Engineer, Research Scientist, AI Security Engineer, NLP Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.