Library-Aware Doubles and Iterative Repair for Large Language Model-Generated Unit Tests in OpenSIL Firmware

2025-09-06 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, extended

Summary

An automated unit test (UT) authoring workflow for Advanced Micro Devices' (AMD) Open-Source Silicon Initialization Library (openSIL) firmware significantly reduces manual effort in validating low-level C firmware changes. This LLM-guided multi-agent pipeline automates test scaffold generation, creates or reuses library-aware stubs, mocks, and fakes, and employs an iterative compile-dispatch repair loop using build logs and line-coverage feedback. Evaluated on 76 functions under test (FUTs), the workflow generated compilable UTs for 73 functions (96.1%). For a 48-function subset, mean line coverage reached 98.8% with line-coverage guidance (LCA-only) and 94.7% with LCA combined with vector-database (VDB) retrieval. The system utilizes GPT-4.1-mini, o4-mini, and o3 LLMs for different stages.

Key takeaway

For firmware developers automating unit test generation in constrained C environments, you should prioritize iterative repair loops and coverage-guided refinement over single-pass LLM generation. Integrate retrieval-augmented generation to reuse existing test doubles and enforce strict build constraints. This approach significantly improves build success and line coverage, reducing manual debugging and ensuring test suite consistency.

Key insights

Iterative LLM-guided repair and coverage feedback are crucial for generating robust unit tests in constrained firmware environments.

Principles

Buildability precedes test logic in firmware UTs.
Library reuse reduces linker errors and improves consistency.
Coverage feedback guides targeted test refinement.

Method

The workflow uses a multi-agent LLM pipeline with retrieval-augmented generation, iterative compile-dispatch repair, and coverage-guided refinement to produce and improve unit tests.

In practice

Implement a "do-not-redefine" list for symbols.
Confine LLM edits to template-approved code blocks.
Use LCOV for line-by-line coverage feedback.

Topics

Automated Test Generation
LLM Code Generation
Firmware Testing
openSIL
Unit Testing
Coverage-Guided Testing

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.