[R] ICLR: Guess which peer review is human or AI?
Summary
A game hosted on reviewer3.com challenges participants to distinguish between human-written and AI-generated peer reviews from ICLR. The game, which presents two reviews for a given paper, has led many players to suspect it is a data collection effort by "reviewer3" to evaluate their AI models. Participants quickly identified several heuristics for detecting AI-generated reviews, including shorter text length, simpler wording, and a lack of formatting like bolding, italics, or LaTeX equations. Many players reported achieving near-perfect scores by consistently assigning the shortest or least formatted text to human reviewers. The consensus among players is that current LLM-generated reviews are often insubstantial, repetitive, and ultimately time-wasting.
Key takeaway
For AI Scientists evaluating the utility of LLMs in academic peer review, this game highlights current limitations. Your models likely produce reviews that are easily identifiable as AI-generated due to their brevity, lack of formatting, and superficial content. Focus on developing prompts that encourage deeper analysis, specific feedback, and the inclusion of standard academic formatting to improve AI review quality and reduce detection.
Key insights
AI-generated peer reviews are often detectable by their brevity, simpler language, and lack of formatting.
Principles
- Shorter text often indicates human authorship in reviews.
- AI output quality correlates with prompt quality.
In practice
- Prioritize shorter, less formatted reviews as potentially human.
- Craft detailed prompts for higher quality AI text generation.
Topics
- AI Peer Review
- AI Content Detection
- Large Language Models
- ICLR Reviews
- AI Writing Quality
Best for: AI Scientist, AI Researcher, Research Scientist, Prompt Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.