Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents

2026-06-12 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

The OSL-MR (Observability-Safe Learning for Memory Retention) framework addresses the challenge of managing memory in long-horizon language agents, which often accumulate observations and facts exceeding their context windows. Existing memory systems typically treat retention as a local decision, overlooking long-term consequences and realistic observability constraints. OSL-MR formulates memory retention as a constrained stochastic optimization problem, explicitly modeling budget feasibility, evidence utility, and delayed costs like miss penalties and stale-information risk. It enforces a strict separation between online-observable features and offline-available supervision (OAS). The framework combines an evidence learner, trained offline from realized evidence supervision, with a Mixed-Score heuristic that serves as both a deployable online-safe baseline and a structured inductive prior. Experiments on LOCOMO and LongMemEval benchmarks demonstrate that OSL-MR consistently outperforms recency-based methods, Generative Agents-style scoring, and other heuristic baselines, particularly under tight memory budgets (e.g., F1 of 0.302 and reward of 305.2 on LoCoMo at budget 128, compared to 0.069 and 132.5 for Mixed-Score). The Mixed-Score prior further improves precision while preserving recall, and sensitivity analysis confirms robustness across various cost configurations.

Key takeaway

For Machine Learning Engineers developing long-horizon language agents, managing memory under tight budget constraints and partial observability is critical. You should consider adopting the OSL-MR framework, which offers a principled, learning-based approach to memory retention. Its strict observability separation ensures deployability, while evidence-supervised learning significantly outperforms traditional heuristics. Implement a strong heuristic for cold-start data collection and as an inductive prior for your learned policy to maximize precision and overall retention quality.

Key insights

Memory retention for long-horizon agents is a constrained stochastic optimization problem requiring observability-safe learning.

Principles

Memory retention is a long-horizon sequential decision.
Strictly separate online-observable features from offline supervision.
Static importance scores often mismatch query-specific evidence.

Method

OSL-MR integrates a constrained optimization formulation, an evidence learner trained offline from interaction logs, and a Mixed-Score heuristic for cold-start and inductive prior.

In practice

Deploy a strong heuristic for cold-start data collection.
Train an evidence learner offline using gold evidence labels.
Incorporate a heuristic score as an inductive prior for learning.

Topics

Long-Horizon Language Agents
Memory Retention
Constrained Optimization
Observability-Safe Learning
Large Language Models
Resource Allocation

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.