Context Pruning for Coding Agents via Multi-Rubric Latent Reasoning

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

LaMR (Latent Multi-Rubric) is a novel structured pruning framework designed for LLM-powered coding agents to address the issue of excessive token consumption from irrelevant repository files. Existing pruners use a single-objective sequence labeler, which creates a modeling bottleneck by forcing one CRF transition prior to handle both contiguous semantic spans and sparse structural support lines. LaMR decomposes code relevance into two interpretable quality dimensions: semantic evidence and dependency support, each modeled by a dedicated Conditional Random Field (CRF) with dimension-specific transition dynamics. A mixture-of-experts gating network dynamically weights these per-rubric emissions, and a final CRF layer produces the aggregate keep-or-prune decision. LaMR derives multi-rubric labels from existing training corpora via AST-based program analysis, simultaneously denoising teacher's binary labels. Experiments on SWE-Bench Verified, SWE-QA, LCC, and LongCodeQA show LaMR wins 12 of 16 multi-turn comparisons, saving up to 31% more tokens and improving Exact Match by up to +3.5 on single-turn tasks, often outperforming unpruned baselines.

Key takeaway

For research scientists developing LLM-powered coding agents, LaMR offers a superior approach to context management. By explicitly separating semantic and dependency relevance, your agents can achieve significant token savings (up to 31%) and improved task performance (up to +3.5 EM) compared to single-objective pruners. You should consider integrating LaMR as a middleware to enhance efficiency and reasoning precision, especially with stronger backbone models like Claude Opus 4.6, where traditional pruning can increase token usage.

Key insights

Decomposing code relevance into semantic and dependency rubrics improves LLM coding agent context pruning.

Principles

Method

LaMR uses parallel CRF heads for semantic and dependency relevance, dynamically weighted by a query-adaptive Mixture-of-Experts gate, with AST-derived labels and a final fused CRF for pruning.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.