ConcoLixir: Reactive LLM Discovery Oracles for Python Concolic Testing

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

ConcoLixir is a novel reactive LLM extension designed to enhance Python concolic testing, addressing common limitations like symbolic downgrading, solver difficulties with semantic operations (e.g., regex, JSON parsing), and coverage plateaus. It integrates a Large Language Model as a "discovery oracle" that generates initial test seeds, suggests concrete inputs when the SMT solver fails, and targets uncovered code when coverage stalls. Each LLM-generated candidate is executed concolically, with only observed coverage and collected path constraints guiding subsequent exploration. Across synthetic, real-world, and library targets, ConcoLixir improved mean line coverage by 8.6, 15.1, and 17.0 percentage points, respectively, compared to a baseline concolic tester. These gains were most pronounced near semantic barriers and library boundaries, with the entire evaluation costing \$1.63 in API charges using gpt-4o-mini-2024-07-18.

Key takeaway

For software engineers or test automation specialists aiming to maximize Python test coverage, particularly in applications involving complex string operations, parsers, or opaque library calls, you should integrate LLM-driven discovery into your concolic testing pipeline. This approach, exemplified by ConcoLixir, effectively bypasses semantic barriers that traditional SMT solvers struggle with, leading to substantial coverage gains. However, be mindful of the increased wall clock time and API costs, using LLMs as a targeted complement for stalled exploration rather than a primary test generation method.

Key insights

LLMs can effectively complement concolic testing by generating concrete inputs to bypass semantic barriers, rather than replacing symbolic solvers.

Principles

Method

ConcoLixir performs static analysis to inform LLM seed generation, then reactively invokes the LLM during concolic execution for solver failures or coverage stalls, and for post-loop discovery, with candidates validated concolically.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.