AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning

2026-05-18 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

AMARIS, a Memory-Augmented Rubric Improvement System, enhances rubric-based reward shaping for fine-tuning Large Language Models (LLMs) using Reinforcement Learning (RL). Unlike prior methods that discard evaluation diagnostics after immediate use, AMARIS grounds rubric modifications in long-term training history. It analyzes individual rollouts, aggregates findings into step-level summaries, and retrieves relevant historical context from a persistent evaluation memory using both static (recent steps) and dynamic (semantically matched) retrieval. This system updates rubrics based on accumulated analyses, operating asynchronously alongside the normal RL loop with approximately 5% time overhead. Experiments in both closed and open-ended domains demonstrate that AMARIS consistently outperforms baseline methods, with ablation studies confirming the performance contributions of both static and dynamic memory retrieval.

Key takeaway

For AI Engineers fine-tuning LLMs with RL, adopting a memory-augmented rubric system like AMARIS can significantly improve model performance and training efficiency. You should consider integrating persistent evaluation memory to avoid re-deriving evaluation principles and to support curriculum-like progression, potentially reducing training time and enhancing model quality.

Key insights

Persistent evaluation memory significantly improves rubric-based reward shaping for LLM fine-tuning in RL.

Principles

Accumulate evaluation knowledge over time.
Combine static and dynamic memory retrieval.
Asynchronous execution minimizes overhead.

Method

AMARIS analyzes rollouts, summarizes findings, retrieves historical context from persistent memory, and updates rubrics asynchronously to improve RL training.

In practice

Implement persistent evaluation memory.
Use both recent and semantically matched history.
Run evaluation improvements asynchronously.

Topics

AMARIS
Rubric-Based Reinforcement Learning
Reward Shaping
Large Language Models
Evaluation Memory

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.