PBFuzz: Agentic Directed Fuzzing for PoV Generation

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Software Development & Engineering · Depth: Expert, extended

Summary

PBFuzz is an agentic directed fuzzing framework designed for efficient Proof-of-Vulnerability (PoV) input generation in software security. It addresses the limitations of traditional directed greybox fuzzing and LLM-assisted methods by mimicking human experts. PBFuzz iteratively analyzes code to extract semantic reachability and triggering constraints, hypothesizes PoV plans, encodes them as test inputs, and refines its understanding using debugging feedback. Key features include autonomous code reasoning, custom MCP tools for on-demand program analysis, persistent memory to prevent hypothesis drift, and property-based testing for efficient constraint solving. On the Magma benchmark, PBFuzz triggered 57 vulnerabilities, exclusively finding 17, within a 30-minute budget per target, demonstrating a 25.6x efficiency gain over AFL++ with CmpLog, at an average API cost of \$1.83 per vulnerability.

Key takeaway

For AI Security Engineers or vulnerability researchers tasked with generating Proof-of-Vulnerability (PoV) inputs, especially for complex software, you should consider adopting agentic, property-based fuzzing frameworks like PBFuzz. This approach significantly accelerates PoV discovery and improves consistency compared to traditional or basic LLM-assisted fuzzers, particularly for vulnerabilities requiring intricate structural invariants. However, be mindful of its limitations regarding systemic scope and construct validity bias, which may necessitate hybrid strategies for certain bug types.

Key insights

PBFuzz leverages agentic LLMs and property-based testing to bridge semantic understanding and efficient PoV input generation.

Principles

Method

PBFuzz uses a four-phase workflow: PLAN (infer constraints), IMPLEMENT (encode to generators), EXECUTE (PBT fuzzing), and REFLECT (diagnose/refine hypotheses).

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, AI Security Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.