VeriCWEty: Embedding enabled Line-Level CWE Detection in Verilog

2026-03-20 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, extended

Summary

VeriCWEty is a novel embedding-based framework designed for detecting and classifying Common Weakness Enumerations (CWEs) in Verilog hardware designs at both module and line-level granularity. Existing methods often struggle with semantic vulnerabilities or precise localization, relying on rule-based checks or coarse structural analysis. VeriCWEty addresses this by leveraging vector embeddings generated by a Verilog fine-tuned decoder-only Large Language Model (LLM) to capture the syntactic and semantic nuances of CWEs. The framework employs a voting-based scheme using LLMs like Gemini3, GPT-5-Nano, and GPT-5.4 to automatically label buggy datasets, improving annotation quality. It achieves approximately 89% precision in identifying critical CWEs such as CWE-1244 (Internal Asset Exposure) and CWE-1245 (Shared Resource Leakage), and 96% accuracy in pinpointing line-level bugs, outperforming prior LLM-based approaches like VerilogLAVD and Self-HWDebug.

Key takeaway

For hardware security analysts evaluating Verilog designs, VeriCWEty offers a significant advancement in automated vulnerability detection. Your teams should consider integrating embedding-based frameworks to move beyond traditional rule-based checks, enabling more precise, context-aware identification of CWEs like CWE-1244 and CWE-1245 at both module and line levels. This approach can enhance verification efficiency and strengthen hardware security assurance by pinpointing exact vulnerability locations.

Key insights

VeriCWEty uses LLM-generated vector embeddings for precise, granular hardware vulnerability detection in Verilog.

Principles

Vector embeddings capture semantic nuances of CWEs.
Voting-based LLM annotation improves dataset quality.
Combining module and line embeddings enhances context.

Method

VeriCWEty involves data generation via LLM voting, embedding extraction using a Verilog-tuned LLM, and training an XGBoost classifier for module and line-level CWE detection.

In practice

Use 'ajn313/cl-verilog-1.0' for Verilog embedding extraction.
Employ XGBoostClassifier for robust imbalance handling.
Combine module and line embeddings for improved line-level context.

Topics

Hardware Vulnerability Detection
Verilog CWE Detection
LLM Vector Embeddings
Line-Level Bug Detection
Common Weakness Enumeration

Best for: CTO, Research Scientist, AI Scientist, Machine Learning Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.