Agent trajectories as programs: fingerprinting and programming coding-agent behavior
Summary
This work introduces methods for procedurally comparing coding agents, defining "behavioral fingerprints" as identifiable habits. Analyzing ten agents across four scaffolds (SWE-agent, Agentless, DARS, Moatless) and models like GPT, Claude, DeepSeek, and Qwen, the research found agents are identifiable by these fingerprints at 85.7% accuracy. It develops procedural representations using an emergent vocabulary induction technique, applying this to the SWE-Bench dataset. Findings indicate behavioral similarity between models from similar release periods and distilled pairs, with a student model and its teacher showing a Jensen-Shannon divergence of 0.25. The study also releases ProcGrep, a library for auditing agent traces, which outperforms LLMs in episodic search, offering deterministic, programmable search over agent trajectories.
Key takeaway
For AI Engineers shaping coding agents, understanding procedural fingerprints is crucial for informed model selection and configuration. You should utilize tools like ProcGrep to audit agent traces, enabling deterministic search and fine-grained behavioral analysis. This allows you to programmatically define and reward desired problem-solving styles, optimize for cost efficiency, and improve task-aware model routing, moving beyond simple success rates to holistic agent evaluation.
Key insights
Agent problem-solving styles are unique "fingerprints" recoverable from procedural traces, enabling new evaluation and programming methods.
Principles
- Agents are identifiable by procedural habits at 85.7% accuracy.
- Distillation transfers procedural style, reducing divergence to 0.25.
- Procedural diversity does not correlate with agent capabilities.
Method
Develop procedural representations via emergent vocabulary induction using BPE, determining vocabulary stability with V-measure (K=192) based on homogeneity and completeness.
In practice
- Use ProcGrep for deterministic, programmable search over agent traces.
- Implement procedural rewards for fine-grained agent behavior shaping.
- Monitor agent behavior for task-aware model routing and cost analysis.
Topics
- Agent Behavior Analysis
- Procedural Fingerprinting
- Coding Agents
- LLM Evaluation
- ProcGrep
- Behavioral Programming
Code references
Best for: Research Scientist, AI Engineer, Machine Learning Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.