Improve Large Language Model Systems with User Logs

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

The UNO (User log-driveN Optimization) framework, developed by Changyue Wang and Qingyao Ai from Tsinghua University, addresses the challenges of continually improving large language model systems (LLMsys) using noisy, unstructured user interaction logs. UNO distills raw logs into semi-structured rules and preference pairs, then employs query-and-feedback-driven clustering to manage data heterogeneity. A key innovation is quantifying the "cognitive gap" between the model's prior knowledge and log data, using a threshold of 0.45 to classify clusters. This assessment guides the LLMsys to adaptively filter noisy feedback and construct either a Primary Experience Module (Expert LoRA) or a Reflective Experience Module (Critic LoRA). Evaluated on MemoryBench using Qwen3-8B and phi-4 (14B) models, UNO consistently achieved leading effectiveness and efficiency, significantly outperforming Retrieval Augmented Generation (RAG) and memory-based baselines across four task-based datasets. The framework also supports online evolution, demonstrating steady performance improvements.

Key takeaway

For MLOps Engineers deploying LLMs, if you are struggling with continuous improvement from real-world user interactions, consider implementing an adaptive framework like UNO. Your system can achieve robust, top-tier performance by distilling user logs into actionable rules, clustering data by cognitive gap, and dynamically applying either direct parameter updates or critique-based refinement. This approach mitigates noise risks and enables effective lifelong learning without relying on extensive external context.

Key insights

Continual LLM improvement from user logs requires adaptive strategies to manage noise and cognitive gaps.

Principles

Method

UNO preprocesses logs into rules/preferences, clusters data, assesses cognitive gap, then trains either an Expert LoRA for direct generation or a Critic LoRA for iterative refinement based on cluster characteristics.

In practice

Topics

Code references

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.