LLM Agents Can See Code Repositories

2026-06-16 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

The paper introduces SeeRepo, a multimodal framework for LLM-powered coding agents designed to improve software engineering tasks by integrating visual structural context with traditional text interfaces. Experiments across GPT-5-mini, GPT-5.1, Doubao-Seed-2.0-Lite, and Kimi K2.5 on SWE-bench Verified reveal that vision-only context degrades accuracy by 13.6% to 34.1% and inflates token costs by up to 268%. However, combining visual context graphs with text reduces input token consumption by up to 26% and overall cost by up to 46% while maintaining or improving issue-resolution accuracy. Visual tools are most effective during the fault localization stage, and graph-based layouts with agent-decided exploration depth offer the best efficiency.

Key takeaway

For AI Engineers developing LLM-powered coding agents, integrating multimodal repository representations like SeeRepo is crucial for optimizing performance and cost. You should prioritize hybrid text-plus-visual interfaces, specifically using graph-based layouts with dynamic exploration depth, and strategically invoke visual tools during the fault localization stage to achieve significant token and cost reductions while maintaining or improving resolution accuracy. Avoid vision-only approaches, which prove inefficient.

Key insights

Integrating visual structural context with text significantly boosts coding agent efficiency and accuracy in repository tasks.

Principles

Vision-only context degrades LLM agent performance.
Hybrid text+vision improves efficiency.
Graph layouts are most token-efficient.

Method

SeeRepo constructs AST-based multi-relation dependency graphs, rendering query-centered Graphviz subgraphs as PNG images alongside text for agents.

In practice

Use graph-based layouts for repository visualization.
Implement dynamic depth for graph queries.
Prioritize visual tools during fault localization.

Topics

LLM Agents
Multimodal LLMs
Code Repositories
Software Engineering
Fault Localization
Graph Visualization

Code references

cslsolow/SeeRepo

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.