CTIConnect: A Benchmark for Retrieval-Augmented LLMs over Heterogeneous Cyber Threat Intelligence
Summary
CTIArena is introduced as the first benchmark for evaluating large language models (LLMs) on heterogeneous, multi-source cyber threat intelligence (CTI) in knowledge-augmented settings. This benchmark addresses limitations of prior efforts by covering nine tasks across structured, unstructured, and hybrid CTI categories, comprising 691 high-quality QA pairs. Evaluation of ten widely used LLMs, including proprietary models like GPT-5 and open-source models like LLaMA-3-405B, revealed that most LLMs perform poorly in closed-book scenarios. However, they show noticeable performance gains when augmented with security-specific knowledge through techniques like CSKG-guided RAG and query-expanded RAG. These findings underscore that scaling model size alone is insufficient for CTI; domain-tailored knowledge augmentation is crucial.
Key takeaway
For AI Scientists and Machine Learning Engineers developing CTI solutions, you should prioritize integrating domain-specific knowledge augmentation over relying solely on larger, general-purpose LLMs. Implement tailored retrieval-augmented generation (RAG) strategies, such as CSKG-guided RAG for unstructured data or query-expanded RAG for hybrid tasks, to significantly improve performance and reduce hallucinations. This approach is critical for building robust CTI copilots that can effectively reason across diverse and fragmented intelligence sources.
Key insights
LLMs require domain-specific knowledge augmentation and tailored retrieval strategies for effective cyber threat intelligence analysis.
Principles
- Structured CTI tasks achieve near-perfect accuracy with external knowledge.
- Hybrid CTI tasks demand precise knowledge retrieval and grounding.
- Unstructured CTI performance hinges on cross-report synthesis.
Method
CTIArena uses a three-stage pipeline: seed correlation annotation, factually-grounded QA synthesis via templates, and LLM-human collaborative curation for quality control.
In practice
- Implement CSKG-guided RAG for unstructured CTI synthesis.
- Apply query-expanded RAG for hybrid CTI tasks to align narratives.
- Inject authoritative CTI entries for structured reasoning tasks.
Topics
- Cyber Threat Intelligence
- Large Language Models
- Retrieval-Augmented Generation
- CTI Benchmarking
- MITRE ATT&CK
- Cybersecurity Knowledge Graphs
Code references
Best for: Research Scientist, AI Architect, AI Engineer, AI Scientist, AI Security Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.