Improve Large Language Model Systems with User Logs
Summary
The UNO (User log-driveN Optimization) framework, developed by Changyue Wang and Qingyao Ai from Tsinghua University, addresses the challenges of continually improving large language model systems (LLMsys) using noisy, unstructured user interaction logs. UNO distills raw logs into semi-structured rules and preference pairs, then employs query-and-feedback-driven clustering to manage data heterogeneity. A key innovation is quantifying the "cognitive gap" between the model's prior knowledge and log data, using a threshold of 0.45 to classify clusters. This assessment guides the LLMsys to adaptively filter noisy feedback and construct either a Primary Experience Module (Expert LoRA) or a Reflective Experience Module (Critic LoRA). Evaluated on MemoryBench using Qwen3-8B and phi-4 (14B) models, UNO consistently achieved leading effectiveness and efficiency, significantly outperforming Retrieval Augmented Generation (RAG) and memory-based baselines across four task-based datasets. The framework also supports online evolution, demonstrating steady performance improvements.
Key takeaway
For MLOps Engineers deploying LLMs, if you are struggling with continuous improvement from real-world user interactions, consider implementing an adaptive framework like UNO. Your system can achieve robust, top-tier performance by distilling user logs into actionable rules, clustering data by cognitive gap, and dynamically applying either direct parameter updates or critique-based refinement. This approach mitigates noise risks and enables effective lifelong learning without relying on extensive external context.
Key insights
Continual LLM improvement from user logs requires adaptive strategies to manage noise and cognitive gaps.
Principles
- User logs contain valuable, authentic feedback for LLM evolution.
- Distinguish useful signals from noise via cognitive gap assessment.
- Adaptive module selection improves robustness and efficiency.
Method
UNO preprocesses logs into rules/preferences, clusters data, assesses cognitive gap, then trains either an Expert LoRA for direct generation or a Critic LoRA for iterative refinement based on cluster characteristics.
In practice
- Implement dual-feature clustering for user log data.
- Quantify model's "cognitive gap" with log data.
- Use LoRA for parameter-efficient adaptation.
Topics
- Large Language Models
- Continual Learning
- User Log Optimization
- Cognitive Gap
- LoRA
- Retrieval-Augmented Generation
Code references
Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.