MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought

· Source: cs.MA updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

MemCoT is a novel test-time memory scaling framework designed to address severe hallucinations and catastrophic forgetting in Large Language Models (LLMs) when performing causal reasoning over extensive, fragmented contexts. Unlike traditional static, single-step retrieval mechanisms, MemCoT redefines long-context reasoning as an iterative, stateful information search process. It incorporates a multi-view long-term memory perception module that enables "Zoom-In" evidence localization and "Zoom-Out" contextual expansion, allowing the model to precisely identify relevant evidence and reconstruct surrounding causal structures. Additionally, MemCoT utilizes a task-conditioned dual short-term memory system, comprising semantic state memory and episodic trajectory memory, to record historical search decisions and dynamically guide query decomposition and pruning across iterations. Empirical evaluations show MemCoT achieves state-of-the-art performance on the LoCoMo and LongMemEval-S benchmarks, with GPT-4o-mini reaching an F1 score of 58.03% on LoCoMo.

Key takeaway

For AI Engineers and Research Scientists developing LLM agents for long-context reasoning, MemCoT offers a robust framework to mitigate hallucinations and forgetting. You should consider adopting an iterative, stateful memory-driven approach, integrating multi-view perception and dynamic short-term memory to enhance reasoning coherence and accuracy. This paradigm shift from passive retrieval to active memory management can significantly improve performance on complex multi-hop and temporal reasoning tasks.

Key insights

MemCoT transforms LLM long-context reasoning into an iterative, stateful memory search process, significantly reducing hallucinations and forgetting.

Principles

Method

MemCoT employs a recurrent memory-reasoning loop with a multi-view long-term memory perception module (zoom-in, zoom-out, visual grounding) and a dual short-term memory system for dynamic query evolution (decomposition, pruning).

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.MA updates on arXiv.org.