MemRerank: Preference Memory for Personalized Product Reranking

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

MemRerank is a novel preference memory framework designed to enhance personalized product reranking in LLM-based shopping agents. It addresses the inefficiency of directly using long, noisy purchase histories by distilling user preferences into concise, query-independent signals. The system employs a reinforcement learning (RL) trained memory extractor, supervised by downstream reranking performance, to generate structured within-category and cross-category shopping preferences. Evaluated on an end-to-end benchmark using an LLM-based 1-in-5 selection task, MemRerank consistently outperformed baselines, achieving up to +10.61 absolute points in accuracy with the o4-mini reranker and +6.60 points with GPT-4.1-mini, especially when combined with "think tags" for explicit reasoning. This demonstrates its effectiveness as a practical building block for agentic e-commerce personalization.

Key takeaway

For AI Engineers building agentic e-commerce recommender systems, you should prioritize implementing explicit preference memory modules like MemRerank. Directly feeding raw purchase histories to LLMs is often suboptimal; instead, distill user preferences into concise, query-independent signals. Consider training your memory extractor with reinforcement learning, using downstream reranking accuracy as a direct optimization target, and employ semi-structured prompts for better extraction quality. This approach can yield substantial accuracy gains, as shown by up to +10.61 points in 1-in-5 reranking tasks.

Key insights

Distilling user purchase history into concise, query-independent preference memory significantly boosts LLM-based product reranking accuracy.

Principles

Method

MemRerank extracts structured within-category and cross-category preference memory from purchase history using an LLM, then trains this extractor via GRPO with a reward function combining format adherence and downstream 1-in-5 reranking accuracy.

In practice

Topics

Best for: Research Scientist, AI Product Manager, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.