Testing suggests Google's AI Overviews tell millions of lies per hour

· Source: AI - Ars Technica · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Emerging Technologies & Innovation · Depth: Novice, short

Summary

A new analysis by The New York Times, in collaboration with the startup Oumi, assessed the accuracy of Google's Gemini-powered AI Overviews, finding it to be correct 90 percent of the time. This evaluation utilized the SimpleQA benchmark, a list of over 4,000 questions with verifiable answers. Initial tests with Gemini 2.5 showed an 85 percent accuracy rate, which improved to 91 percent after the Gemini 3 update. Extrapolating this 10 percent error rate to all Google searches suggests AI Overviews generates tens of millions of incorrect answers daily. Google disputes the findings, with spokesperson Ned Adriance claiming SimpleQA contains incorrect information and that the study "doesn't reflect what people are actually searching on Google," preferring its own SimpleQA Verified test.

Key takeaway

For CTOs and VPs of Engineering evaluating AI search integration, your teams should recognize that even 90% accuracy in AI Overviews means a significant volume of errors at scale. You must implement robust verification layers or clearly communicate the potential for inaccuracies to end-users, especially for critical information, rather than relying solely on the AI's summary.

Key insights

Google's AI Overviews achieve 90% accuracy on SimpleQA, but its 10% error rate translates to millions of daily inaccuracies.

Principles

Method

The analysis used the SimpleQA evaluation, feeding over 4,000 verifiable questions to AI Overviews, and then comparing its answers to known facts to determine accuracy percentages.

In practice

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, Tech Journalist, AI Product Manager, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI - Ars Technica.