[R] ICLR: Guess which peer review is human or AI?

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Intermediate, quick

Summary

A game hosted on reviewer3.com challenges participants to distinguish between human-written and AI-generated peer reviews from ICLR. The game, which presents two reviews for a given paper, has led many players to suspect it is a data collection effort by "reviewer3" to evaluate their AI models. Participants quickly identified several heuristics for detecting AI-generated reviews, including shorter text length, simpler wording, and a lack of formatting like bolding, italics, or LaTeX equations. Many players reported achieving near-perfect scores by consistently assigning the shortest or least formatted text to human reviewers. The consensus among players is that current LLM-generated reviews are often insubstantial, repetitive, and ultimately time-wasting.

Key takeaway

For AI Scientists evaluating the utility of LLMs in academic peer review, this game highlights current limitations. Your models likely produce reviews that are easily identifiable as AI-generated due to their brevity, lack of formatting, and superficial content. Focus on developing prompts that encourage deeper analysis, specific feedback, and the inclusion of standard academic formatting to improve AI review quality and reduce detection.

Key insights

AI-generated peer reviews are often detectable by their brevity, simpler language, and lack of formatting.

Principles

In practice

Topics

Best for: AI Scientist, AI Researcher, Research Scientist, Prompt Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.