[D] ICML 2026: Policy A vs Policy B impact on scores discussion

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, medium

Summary

An informal poll and community discussion at ICML 2026 suggest a potential disparity in paper review scores based on two distinct LLM-review policies. Policy A, a stricter policy prohibiting LLM use, appears to result in harsher average scores (mean 3.23, std dev 0.55, 36 samples) compared to Policy B, which permits limited LLM assistance (mean 3.47, std dev 0.80, 19 samples). Conversely, Policy A reviews show higher reviewer confidence (3.54) than Policy B (3.22). This observation, though based on a small, self-selected sample of 55 responses, aligns with external research indicating that AI-generated reviews tend to be more lenient and positive. The discussion highlights concerns about fairness and the potential for LLM assistance to influence review tone, breadth of knowledge, and perceived novelty.

Key takeaway

For AI Scientists submitting to conferences with varied LLM review policies, be aware that policies allowing LLM assistance may lead to slightly higher average scores and more lenient reviews. If your paper is reviewed under a stricter, no-LLM policy, your scores might appear comparatively harsher. You should advocate for score normalization across different review policy groups to ensure fair evaluation of submissions.

Key insights

LLM-assisted peer review may lead to more lenient scores and polished reviews compared to stricter human-only policies.

Principles

Method

An informal poll collected paper scores, review policies, and perceived review harshness/leniency to snapshot community observations on LLM-assisted peer review.

In practice

Topics

Best for: AI Scientist, AI Researcher, Research Scientist, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.