PBFuzz: Agentic Directed Fuzzing for PoV Generation

2026-06-30 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Software Development & Engineering · Depth: Expert, extended

Summary

PBFuzz is an agentic directed fuzzing framework designed for efficient Proof-of-Vulnerability (PoV) input generation in software security. It addresses the limitations of traditional directed greybox fuzzing and LLM-assisted methods by mimicking human experts. PBFuzz iteratively analyzes code to extract semantic reachability and triggering constraints, hypothesizes PoV plans, encodes them as test inputs, and refines its understanding using debugging feedback. Key features include autonomous code reasoning, custom MCP tools for on-demand program analysis, persistent memory to prevent hypothesis drift, and property-based testing for efficient constraint solving. On the Magma benchmark, PBFuzz triggered 57 vulnerabilities, exclusively finding 17, within a 30-minute budget per target, demonstrating a 25.6x efficiency gain over AFL++ with CmpLog, at an average API cost of \$1.83 per vulnerability.

Key takeaway

For AI Security Engineers or vulnerability researchers tasked with generating Proof-of-Vulnerability (PoV) inputs, especially for complex software, you should consider adopting agentic, property-based fuzzing frameworks like PBFuzz. This approach significantly accelerates PoV discovery and improves consistency compared to traditional or basic LLM-assisted fuzzers, particularly for vulnerabilities requiring intricate structural invariants. However, be mindful of its limitations regarding systemic scope and construct validity bias, which may necessitate hybrid strategies for certain bug types.

Key insights

PBFuzz leverages agentic LLMs and property-based testing to bridge semantic understanding and efficient PoV input generation.

Principles

PoV generation is finding counterexamples violating safety properties.
Agentic LLMs can reason, plan, orchestrate tools, and self-reflect.
Property-based testing systematically explores input spaces, preserving validity.

Method

PBFuzz uses a four-phase workflow: PLAN (infer constraints), IMPLEMENT (encode to generators), EXECUTE (PBT fuzzing), and REFLECT (diagnose/refine hypotheses).

In practice

Use agentic LLMs for dynamic context search and hypothesis validation.
Employ persistent memory to prevent hypothesis drift.
Synthesize parameterized input generators for efficient constraint solving.

Topics

Agentic AI
Directed Fuzzing
Proof-of-Vulnerability
Property-Based Testing
Software Security
Vulnerability Discovery

Code references

R-Fuzz/magma

Best for: Research Scientist, AI Scientist, AI Security Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.