Is Google Search Deteriorating? Measuring Google's Search Quality in 2022
Summary
A recent analysis, prompted by discussions on Hacker News, investigates the perceived decline in Google Search quality. The author, with prior experience in Search Measurement at YouTube, Twitter, and Microsoft, conducted a human evaluation study using Surge AI's raters. This involved 250 raters providing recent search queries, explaining their intent, and rating Google's Search Engine Results Pages (SERPs) on a 1-5 scale. The study also included side-by-side comparisons of Google and Bing SERPs for the same queries. While Google generally outperformed Bing, specific examples highlighted instances where Bing provided superior results, particularly for queries requiring precise intent understanding or local information, such as "databricks series b valuation," "Natural ways to heal cats who have allergies," "what is message blocking on iphone," and "Indianapolis free COVID PCR tests." The article suggests potential reasons for Google's perceived deterioration, including prioritizing ad revenue, the shift of content beyond traditional webpages, increased reliance on ML, and the inherent difficulty in accurately measuring search quality.
Key takeaway
For product managers overseeing search platforms, understanding the limitations of traditional metrics like clicks and time spent is crucial. You should consider implementing human evaluation methodologies, similar to the Surge AI approach, to gain deeper insights into user intent satisfaction and identify specific areas for improvement. This can help in prioritizing development efforts and ensuring your search product effectively meets evolving user needs, especially as content diversifies beyond traditional web pages.
Key insights
Human evaluation is a robust method for rigorously measuring search engine quality beyond traditional metrics.
Principles
- Clicks and time spent are poor search quality metrics.
- Personalized search evaluations yield representative usage patterns.
Method
Human raters evaluate search results against their original query intent on a 1-5 scale, often in side-by-side comparisons with competitor engines, to assess SERP quality.
In practice
- Use human raters for nuanced search quality assessment.
- Compare search engines side-by-side for clearer distinctions.
Topics
- Search Quality Measurement
- Human Evaluation
- Search Engine Comparison
- Google Search
- Bing Search
Best for: Product Manager, AI Product Manager, Data Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Surge AI Blog.