Exploring LLM-based Verilog Code Generation with Data-Efficient Fine-Tuning and Testbench Automation

2025-09-01 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Hardware Design & Electronic Design Automation · Depth: Expert, long

Summary

A new workflow leverages multi-agent large language models (LLMs) to automate testbench generation, creating high-quality fine-tuning data for Verilog code generation. This approach addresses the scarcity of training data and testbenches in hardware description language (HDL) applications. The workflow employs one agent to produce Verilog modules from specifications and another to generate testbenches for verification, integrating LLM agents with existing verification tools. Experiments on the refined VerilogEval v2 benchmark demonstrate that the fine-tuned MA-tb-7B model achieves a pass@1 rate of 68%, comparable to state-of-the-art methods like CodeV-R1-7B-Distill (70%) and CodeV-R1-7B (w/ DAPO) (74%), but with significantly less training data. The process involved deploying DeepSeek-R1 on 16x H100 GPUs to generate reasoning traces from 6,704 Pyra-tb dataset samples, processing over 6 million input tokens and 54 million output tokens in 55 hours.

Key takeaway

For AI Scientists and Machine Learning Engineers developing hardware design automation tools, this research indicates that focusing on multi-agent LLM architectures for automated testbench generation can significantly reduce the data requirements for fine-tuning Verilog code generation models. You should explore integrating similar multi-agent frameworks into your development pipelines to improve data efficiency and verification coverage, potentially accelerating the development of AI-assisted hardware design systems.

Key insights

Multi-agent LLMs can automate testbench generation, creating high-quality data for efficient Verilog code generation.

Principles

Automated testbench generation improves data quality.
Multi-agent LLMs enhance verification efficiency.
Data-efficient fine-tuning yields competitive performance.

Method

The workflow uses DeepSeek-R1 to generate reasoning traces from filtered PyraNet data, then fine-tunes base LLMs. A multi-agent framework, including quality check and testbench generation agents, collaboratively produces comprehensive verification environments.

In practice

Use multi-agent LLMs for automated testbench creation.
Refine existing benchmarks for clearer LLM assessment.
Incorporate pre-generated testbenches to boost pass rates.

Topics

LLM-based Verilog Generation
Multi-Agent LLMs
Testbench Automation
Data-Efficient Fine-Tuning
VerilogEval v2 Benchmark

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.