ReviewGrounder: Improving Review Substantiveness with Rubric-Guided, Tool-Integrated Agents

2026-04-15 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

ReviewGrounder is a new rubric-guided, tool-integrated multi-agent framework designed to improve the substantiveness of LLM-generated peer reviews. It addresses the common issue of superficial LLM reviews by incorporating explicit rubrics and contextual grounding, components often underutilized in existing LLM-based review systems. The framework decomposes the reviewing process into drafting and grounding stages, where initial drafts are enriched through targeted evidence consolidation. To evaluate this, the authors introduce REVIEWBENCH, a benchmark that assesses review text against paper-specific rubrics derived from official guidelines, paper content, and human reviews. Experiments show that ReviewGrounder, utilizing a Phi-4-14B drafter and a GPT-OSS-120B grounding stage, outperforms stronger baselines like GPT-4.1 and DeepSeek-R1-670B in both human judgment alignment and rubric-based quality across eight dimensions.

Key takeaway

For AI scientists and NLP engineers developing peer review support systems, ReviewGrounder demonstrates a clear path to overcoming superficial LLM outputs. Your systems should integrate explicit rubrics and a multi-stage, evidence-grounded approach to significantly improve review quality and alignment with human standards, even with smaller LLMs. Consider adopting the drafting and grounding paradigm to enhance the depth and utility of automated feedback.

Key insights

Integrating rubrics and contextual grounding significantly enhances LLM-generated peer review quality and substantiveness.

Principles

Decompose complex tasks into distinct stages.
Ground LLM outputs with external evidence.
Utilize explicit rubrics for quality assessment.

Method

ReviewGrounder employs a multi-agent framework, separating review generation into a drafting stage and a grounding stage to enrich initial drafts with targeted evidence, guided by paper-specific rubrics.

In practice

Use Phi-4-14B for initial draft generation.
Employ GPT-OSS-120B for evidence grounding.
Develop paper-specific rubrics for evaluation.

Topics

REVIEWGROUNDER
Peer Review Automation
LLM-based Review
Rubric-Guided Agents
Multi-Agent Systems

Code references

EigenTom/ReviewGrounder

Best for: AI Scientist, NLP Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.