Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

A study analyzing peer review reports from top AI conference proceedings reveals significant shifts in review characteristics following the emergence of Large Language Models (LLMs). Researchers investigated linguistic features like length and complexity, automatically annotated evaluation aspects, and used a maximum likelihood estimation method to identify potentially LLM-modified or generated reports. The findings indicate that post-LLM, review texts have become longer, more fluent, and exhibit more standardized linguistic patterns, especially from reviewers with lower confidence scores. There is an increased emphasis on summaries and surface-level clarity, while attention to deeper evaluative dimensions such as originality, replicability, and nuanced critical reasoning has declined. This suggests LLMs are altering the core evaluative functions of peer review.

Key takeaway

For AI Scientists and Research Scientists involved in peer review, be aware that LLM assistance may inadvertently shift review focus towards surface-level aspects and away from critical dimensions like originality and replicability. You should consciously prioritize and articulate deeper evaluative feedback to maintain the quality and rigor of academic discourse, especially when using LLM tools for drafting reviews.

Key insights

LLMs are making peer reviews longer and more fluent but reducing focus on deep evaluative dimensions.

Principles

Method

The study used maximum likelihood estimation to identify LLM-assisted reviews and automatically annotated evaluation aspects of review sentences to track changes.

In practice

Topics

Best for: AI Scientist, Research Scientist, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.