Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI
Summary
A study analyzing peer review reports from top AI conference proceedings reveals significant shifts in review characteristics following the emergence of Large Language Models (LLMs). Researchers investigated linguistic features like length and complexity, automatically annotated evaluation aspects, and used a maximum likelihood estimation method to identify potentially LLM-modified or generated reports. The findings indicate that post-LLM, review texts have become longer, more fluent, and exhibit more standardized linguistic patterns, especially from reviewers with lower confidence scores. There is an increased emphasis on summaries and surface-level clarity, while attention to deeper evaluative dimensions such as originality, replicability, and nuanced critical reasoning has declined. This suggests LLMs are altering the core evaluative functions of peer review.
Key takeaway
For AI Scientists and Research Scientists involved in peer review, be aware that LLM assistance may inadvertently shift review focus towards surface-level aspects and away from critical dimensions like originality and replicability. You should consciously prioritize and articulate deeper evaluative feedback to maintain the quality and rigor of academic discourse, especially when using LLM tools for drafting reviews.
Key insights
LLMs are making peer reviews longer and more fluent but reducing focus on deep evaluative dimensions.
Principles
- LLM use correlates with standardized review language.
- Surface-level clarity gains prominence in LLM-assisted reviews.
Method
The study used maximum likelihood estimation to identify LLM-assisted reviews and automatically annotated evaluation aspects of review sentences to track changes.
In practice
- Monitor review length and fluency for LLM influence.
- Analyze review content for shifts in evaluative focus.
Topics
- Large Language Models
- Peer Review
- Academic Communication
- Linguistic Analysis
- Evaluation Aspects
Best for: AI Scientist, Research Scientist, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.