From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG
Summary
EPIC (Efficient Preference-aligned Index Construction) is a novel method designed for on-device Large Language Model (LLM) agents to manage personal context under tight memory constraints. It addresses the challenge of selectively storing information to ensure retrieval aligns with user preferences, which are treated as a compact and stable form of personal context. EPIC integrates these preferences throughout the Retrieval-Augmented Generation (RAG) pipeline, retaining only preference-relevant data and aligning retrieval accordingly. Benchmarking across conversations, debates, explanations, and recommendations shows EPIC reduces indexing memory by 2,404 times, improves preference-following accuracy by 20.17 percentage points, and achieves 33.33 times lower retrieval latency compared to the best baseline. An on-device experiment demonstrated EPIC maintaining a memory footprint under 1 MB with 29.35 ms/query latency during streaming updates.
Key takeaway
For NLP Engineers developing on-device LLM agents that require personalized context, adopting a preference-aligned indexing approach like EPIC can drastically reduce memory footprint and improve retrieval latency. You should consider integrating user preferences directly into your RAG pipeline to enhance both efficiency and the accuracy of contextually relevant responses, especially in privacy-sensitive applications.
Key insights
On-device LLM agents can use preference-aligned indexing to optimize memory and retrieval for personal context.
Principles
- User preferences are stable personal context.
- Integrate preferences throughout the RAG pipeline.
Method
EPIC selectively retains preference-relevant data from raw input and aligns retrieval towards preference-aligned contexts to optimize on-device RAG performance.
In practice
- Reduce indexing memory by 2,404 times.
- Improve preference-following accuracy by 20.17%.
- Achieve 29.35 ms/query latency on-device.
Topics
- On-Device RAG
- Preference-Aligned Memory
- EPIC Index Construction
- Large Language Models
- Memory Efficiency
Best for: Research Scientist, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.