ViPER: Vision-based Packing-Aware Encoder for Robust Malware Detection

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Computer Vision · Depth: Expert, quick

Summary

ViPER is a Vision-based Packing-Aware Encoder designed for robust malware detection, addressing the critical failure mode of executable packing in visualization-based methods. It utilizes a LoRA-adapted ViT-B/14 backbone with a dual-head architecture that simultaneously learns malware classification and packing detection. A key innovation is its packing-aware gating mechanism, which conditions malware predictions on the inferred packing state, allowing for distinct decision boundaries for packed and unpacked inputs. To counter packing label skew during training, ViPER employs frequency-weighted losses with stratified sampling. Evaluated on 200,000 Windows PE byteplot images, ViPER achieved a balanced accuracy of 0.8521, ROC-AUC of 0.9260, and AUPR of 0.9279, surpassing state-of-the-art baselines, alongside a packing detection AUC of 0.9949.

Key takeaway

For AI Security Engineers developing robust malware detection systems, ViPER's approach demonstrates that integrating packing awareness directly into vision models significantly improves performance against evasive packed binaries. You should consider adopting dual-head architectures and conditional prediction mechanisms to handle complex, multi-faceted threats, especially where data imbalance or specific evasion techniques like packing are prevalent. This method offers superior accuracy and reliability compared to current state-of-the-art baselines.

Key insights

ViPER integrates packing awareness into vision-based malware detection using a dual-head model and conditional predictions for enhanced robustness.

Principles

Method

ViPER employs a LoRA-adapted ViT-B/14 with dual heads for malware and packing, using a gating mechanism to condition malware predictions on inferred packing state, trained with frequency-weighted losses and stratified sampling.

In practice

Topics

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.