Context Pruning for Coding Agents via Multi-Rubric Latent Reasoning
Summary
LaMR (Latent Multi-Rubric) is a novel structured pruning framework designed for LLM-powered coding agents to address the issue of excessive token consumption from irrelevant repository files. Existing pruners use a single-objective sequence labeler, which creates a modeling bottleneck by forcing one CRF transition prior to handle both contiguous semantic spans and sparse structural support lines. LaMR decomposes code relevance into two interpretable quality dimensions: semantic evidence and dependency support, each modeled by a dedicated Conditional Random Field (CRF) with dimension-specific transition dynamics. A mixture-of-experts gating network dynamically weights these per-rubric emissions, and a final CRF layer produces the aggregate keep-or-prune decision. LaMR derives multi-rubric labels from existing training corpora via AST-based program analysis, simultaneously denoising teacher's binary labels. Experiments on SWE-Bench Verified, SWE-QA, LCC, and LongCodeQA show LaMR wins 12 of 16 multi-turn comparisons, saving up to 31% more tokens and improving Exact Match by up to +3.5 on single-turn tasks, often outperforming unpruned baselines.
Key takeaway
For research scientists developing LLM-powered coding agents, LaMR offers a superior approach to context management. By explicitly separating semantic and dependency relevance, your agents can achieve significant token savings (up to 31%) and improved task performance (up to +3.5 EM) compared to single-objective pruners. You should consider integrating LaMR as a middleware to enhance efficiency and reasoning precision, especially with stronger backbone models like Claude Opus 4.6, where traditional pruning can increase token usage.
Key insights
Decomposing code relevance into semantic and dependency rubrics improves LLM coding agent context pruning.
Principles
- Code relevance is multi-dimensional.
- Single-objective pruning creates a modeling bottleneck.
- AST-based analysis can denoise training labels.
Method
LaMR uses parallel CRF heads for semantic and dependency relevance, dynamically weighted by a query-adaptive Mixture-of-Experts gate, with AST-derived labels and a final fused CRF for pruning.
In practice
- Use AST analysis to recover structural code.
- Employ multi-rubric CRFs for distinct relevance types.
- Combine with history managers for full context control.
Topics
- Context Pruning
- Coding Agents
- Multi-Rubric Latent Reasoning
- AST-based Program Analysis
- Conditional Random Fields
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.