LLM Agents Can See Code Repositories
Summary
The paper introduces SeeRepo, a multimodal framework for LLM-powered coding agents designed to improve software engineering tasks by integrating visual structural context with traditional text interfaces. Experiments across GPT-5-mini, GPT-5.1, Doubao-Seed-2.0-Lite, and Kimi K2.5 on SWE-bench Verified reveal that vision-only context degrades accuracy by 13.6% to 34.1% and inflates token costs by up to 268%. However, combining visual context graphs with text reduces input token consumption by up to 26% and overall cost by up to 46% while maintaining or improving issue-resolution accuracy. Visual tools are most effective during the fault localization stage, and graph-based layouts with agent-decided exploration depth offer the best efficiency.
Key takeaway
For AI Engineers developing LLM-powered coding agents, integrating multimodal repository representations like SeeRepo is crucial for optimizing performance and cost. You should prioritize hybrid text-plus-visual interfaces, specifically using graph-based layouts with dynamic exploration depth, and strategically invoke visual tools during the fault localization stage to achieve significant token and cost reductions while maintaining or improving resolution accuracy. Avoid vision-only approaches, which prove inefficient.
Key insights
Integrating visual structural context with text significantly boosts coding agent efficiency and accuracy in repository tasks.
Principles
- Vision-only context degrades LLM agent performance.
- Hybrid text+vision improves efficiency.
- Graph layouts are most token-efficient.
Method
SeeRepo constructs AST-based multi-relation dependency graphs, rendering query-centered Graphviz subgraphs as PNG images alongside text for agents.
In practice
- Use graph-based layouts for repository visualization.
- Implement dynamic depth for graph queries.
- Prioritize visual tools during fault localization.
Topics
- LLM Agents
- Multimodal LLMs
- Code Repositories
- Software Engineering
- Fault Localization
- Graph Visualization
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.