ConcoLixir: Reactive LLM Discovery Oracles for Python Concolic Testing

2024-07-18 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

ConcoLixir is a novel reactive LLM extension designed to enhance Python concolic testing, addressing common limitations like symbolic downgrading, solver difficulties with semantic operations (e.g., regex, JSON parsing), and coverage plateaus. It integrates a Large Language Model as a "discovery oracle" that generates initial test seeds, suggests concrete inputs when the SMT solver fails, and targets uncovered code when coverage stalls. Each LLM-generated candidate is executed concolically, with only observed coverage and collected path constraints guiding subsequent exploration. Across synthetic, real-world, and library targets, ConcoLixir improved mean line coverage by 8.6, 15.1, and 17.0 percentage points, respectively, compared to a baseline concolic tester. These gains were most pronounced near semantic barriers and library boundaries, with the entire evaluation costing \$1.63 in API charges using gpt-4o-mini-2024-07-18.

Key takeaway

For software engineers or test automation specialists aiming to maximize Python test coverage, particularly in applications involving complex string operations, parsers, or opaque library calls, you should integrate LLM-driven discovery into your concolic testing pipeline. This approach, exemplified by ConcoLixir, effectively bypasses semantic barriers that traditional SMT solvers struggle with, leading to substantial coverage gains. However, be mindful of the increased wall clock time and API costs, using LLMs as a targeted complement for stalled exploration rather than a primary test generation method.

Key insights

LLMs can effectively complement concolic testing by generating concrete inputs to bypass semantic barriers, rather than replacing symbolic solvers.

Principles

LLMs serve as discovery oracles, not constraint solvers or correctness validators.
Integrate LLMs reactively, invoking them only upon SMT solver failure or coverage plateaus.
Prioritize SMT solver; use LLM as an escalation for specific challenges.

Method

ConcoLixir performs static analysis to inform LLM seed generation, then reactively invokes the LLM during concolic execution for solver failures or coverage stalls, and for post-loop discovery, with candidates validated concolically.

In practice

Generate inputs for regex, JSON parsers, or structured data with LLMs.
Utilize LLM-generated seeds to initiate concolic exploration.
Employ LLM discovery when SMT solvers return Unknown/Error results.

Topics

Concolic Testing
LLM Integration
Python Test Automation
Code Coverage
SMT Solvers
Test Input Generation

Code references

pschanely/CrossHair

Best for: Research Scientist, AI Scientist, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.