Neurosymbolic Repo-level Code Localization
Summary
Code localization, a key component of autonomous software engineering, currently suffers from a "Keyword Shortcut" bias in existing benchmarks. These benchmarks, despite showing impressive performance, allow models to rely on superficial lexical matching due to abundant keyword references like file paths and function names, rather than requiring genuine structural reasoning. To address this, researchers formalized the Keyword-Agnostic Logical Code Localization (KA-LCL) challenge and introduced KA-LogicQuery, a diagnostic benchmark designed to necessitate structural reasoning without naming hints. State-of-the-art approaches show a catastrophic performance drop on KA-LogicQuery, highlighting their lack of deterministic reasoning. In response, LogicLoc, a novel agentic framework, combines large language models (LLMs) with Datalog's logical reasoning for precise localization. LogicLoc extracts program facts, synthesizes Datalog programs via an LLM with parser-gated validation and mutation-based feedback, and executes them with a high-performance engine, achieving accurate and verifiable localization.
Key takeaway
For research scientists developing autonomous software engineering tools, you should prioritize evaluating code localization models on benchmarks like KA-LogicQuery that demand structural reasoning over lexical matching. Relying solely on current issue-driven benchmarks risks deploying systems that fail catastrophically in real-world scenarios lacking explicit keyword hints. Consider integrating neurosymbolic approaches, such as LogicLoc's Datalog-based framework, to achieve more robust, verifiable, and efficient localization capabilities.
Key insights
Existing code localization benchmarks exhibit a "Keyword Shortcut" bias, hindering genuine structural reasoning in models.
Principles
- Structural reasoning is critical for robust code localization.
- Deterministic engines reduce LLM inference overhead.
Method
LogicLoc uses an LLM to synthesize Datalog programs from codebase facts, validated by a parser and mutation feedback, then executes them with a high-performance engine for accurate, verifiable code localization.
In practice
- Evaluate code localization models on keyword-agnostic benchmarks.
- Integrate Datalog for deterministic structural reasoning.
- Offload structural traversal to specialized engines.
Topics
- Code Localization
- Keyword Shortcut Bias
- KA-LogicQuery Benchmark
- LogicLoc Framework
- Neurosymbolic AI
Best for: Research Scientist, AI Scientist, AI Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.