AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

AMARIS, a Memory-Augmented Rubric Improvement System, enhances rubric-based reward shaping for fine-tuning Large Language Models (LLMs) using Reinforcement Learning (RL). Unlike prior methods that discard evaluation diagnostics after immediate use, AMARIS grounds rubric modifications in long-term training history. It analyzes individual rollouts, aggregates findings into step-level summaries, and retrieves relevant historical context from a persistent evaluation memory using both static (recent steps) and dynamic (semantically matched) retrieval. This system updates rubrics based on accumulated analyses, operating asynchronously alongside the normal RL loop with approximately 5% time overhead. Experiments in both closed and open-ended domains demonstrate that AMARIS consistently outperforms baseline methods, with ablation studies confirming the performance contributions of both static and dynamic memory retrieval.

Key takeaway

For AI Engineers fine-tuning LLMs with RL, adopting a memory-augmented rubric system like AMARIS can significantly improve model performance and training efficiency. You should consider integrating persistent evaluation memory to avoid re-deriving evaluation principles and to support curriculum-like progression, potentially reducing training time and enhancing model quality.

Key insights

Persistent evaluation memory significantly improves rubric-based reward shaping for LLM fine-tuning in RL.

Principles

Method

AMARIS analyzes rollouts, summarizes findings, retrieves historical context from persistent memory, and updates rubrics asynchronously to improve RL training.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.