What is the perfect memory architecture? | Sam Whitmore
Summary
New Computer, creators of the conversational journal "Dot," have been developing AI memory architectures since 2023. Initially, they attempted fact extraction, which proved insufficient for capturing conversational nuance. They then evolved to a "universal memory architecture" using linked entities and schemas, represented as JSON blobs, but found it created too much cognitive overhead for users and models. This led to a re-evaluation of memory principles, resulting in a four-parallel memory system: Theory of Mind (user identity), Episodic Memory (daily events), Entities (refined schemas), and Procedural Memory (situational similarity and learned workflows). Their 2024 retrieval pipeline parallelizes queries across these systems, using hybrid search for entities and loading behavioral modules for procedural memory. The product trajectory is now shifting towards "Dots," a hive mind AI that remembers groups and their relationships, introducing new challenges in representing inter-personal dynamics. With advancements like 1M token context windows, they are experimenting with real-time Q&A, reducing the need for episodic and entity-level compression, emphasizing raw data and procedural memory for insights.
Key takeaway
For AI Engineers building conversational agents, the "perfect" memory architecture is a moving target. You should design your memory systems from first principles, aligning them directly with your product's core functionality. Continuously re-evaluate your approach as underlying LLM technology advances, potentially reducing the need for complex compression and retrieval in favor of larger context windows and real-time processing, focusing instead on procedural learning and insights.
Key insights
AI memory architectures must adapt to evolving technology and product goals, prioritizing raw data and procedural learning.
Principles
- Raw data is the best source of truth.
- Memory systems should be distinct and parallel.
- Product goals drive memory architecture design.
Method
New Computer developed a four-parallel memory system (Theory of Mind, Episodic, Entities, Procedural) with parallelized retrieval, adapting compression strategies based on evolving LLM context windows and cost.
In practice
- Consider a multi-system memory approach.
- Prioritize raw data over compressed artifacts.
- Re-evaluate memory infrastructure with new LLM capabilities.
Topics
- AI Memory Architectures
- Conversational AI
- Retrieval Pipelines
- Procedural Memory
- Large Context Windows
Best for: AI Engineer, Machine Learning Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Greg Kamradt.