Can Small GenAI Language Models Rival Large Language Models in Understanding Application Behavior?

2026-06-12 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Software Development & Engineering · Depth: Expert, long

Summary

A study by Mohammad Meymani et al. from the Canadian Institute for Cybersecurity systematically evaluated small and large Generative AI (GenAI) language models for understanding application behavior, focusing on malware detection. Using 10,000 samples from the SBAN dataset, the research compared models like DeepSeek, Phi, Llama, Qwen, and Mistral, primarily through a prompt-based strategy. While larger models, such as Qwen2.5-Coder, generally achieved higher overall accuracy, smaller models like Phi-4 mini demonstrated competitive precision and recall, particularly a high F1 score for malware detection (Class "1") and the second highest for benign (Class "0"). These smaller models offer significant advantages in computational efficiency, faster inference, and suitability for resource-constrained environments, despite some, like Deepseek-coder-1.3b, showing lower recall for benign samples. The findings suggest small GenAI models can effectively complement larger ones, balancing performance and resource efficiency.

Key takeaway

For AI Security Engineers evaluating malware detection solutions, this research indicates that small language models (SLMs) like Phi-4 mini offer a compelling balance of performance and resource efficiency. You can achieve competitive detection capabilities with significantly faster inference and lower computational costs than larger LLMs. Consider deploying SLMs in resource-constrained environments or for real-time analysis, and explore fine-tuning or model compression to further enhance their robustness.

Key insights

Small GenAI models can rival larger LLMs in malware detection, offering efficiency for resource-constrained environments.

Principles

Larger LLMs generally achieve higher overall accuracy.
Small models offer competitive precision and recall.
Inference time scales directly with model size.

Method

The study evaluated GenAI models for malware detection using 10,000 SBAN dataset samples. It compared classification head and zero-shot prompt-based strategies, focusing on the latter, and measured accuracy, precision, recall, and F1-score.

In practice

Deploy small GenAI models for efficient malware detection.
Prioritize recall over speed for malware prevention systems.
Fine-tune SLMs on diverse malware datasets.

Topics

Small Language Models
Malware Detection
Application Behavior Analysis
Generative AI
Computational Efficiency
Resource-Constrained Environments

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.