From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

EPIC (Efficient Preference-aligned Index Construction) is a novel method designed for on-device Large Language Model (LLM) agents to manage personal context under tight memory constraints. It addresses the challenge of selectively storing information to ensure retrieval aligns with user preferences, which are treated as a compact and stable form of personal context. EPIC integrates these preferences throughout the Retrieval-Augmented Generation (RAG) pipeline, retaining only preference-relevant data and aligning retrieval accordingly. Benchmarking across conversations, debates, explanations, and recommendations shows EPIC reduces indexing memory by 2,404 times, improves preference-following accuracy by 20.17 percentage points, and achieves 33.33 times lower retrieval latency compared to the best baseline. An on-device experiment demonstrated EPIC maintaining a memory footprint under 1 MB with 29.35 ms/query latency during streaming updates.

Key takeaway

For NLP Engineers developing on-device LLM agents that require personalized context, adopting a preference-aligned indexing approach like EPIC can drastically reduce memory footprint and improve retrieval latency. You should consider integrating user preferences directly into your RAG pipeline to enhance both efficiency and the accuracy of contextually relevant responses, especially in privacy-sensitive applications.

Key insights

On-device LLM agents can use preference-aligned indexing to optimize memory and retrieval for personal context.

Principles

Method

EPIC selectively retains preference-relevant data from raw input and aligns retrieval towards preference-aligned contexts to optimize on-device RAG performance.

In practice

Topics

Best for: Research Scientist, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.