Did Google’s AI agents really build an operating system for $916?

2026-05-22 · Source: AI as Normal Technology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, short

Summary

At Google's developer conference, the company launched Gemini 3.5 Flash and Antigravity 2.0, claiming its AI agents built an operating system for \$916.92 using a single prompt and 2.6 billion tokens. However, the authors challenge this, noting the "single prompt" was thousands of lines long with undisclosed generation efforts. They highlight unclear human intervention standards, lack of dry run reporting, and no analysis for code copying, despite acknowledging common toy OS projects. The absence of released prompts, code, or logs prevents independent evaluation. While crediting Google for cost disclosure, the authors emphasize the need for rigorous "open-world evaluations" for complex, long-horizon AI tasks, advocating for independent evaluators to ensure credibility beyond vendor claims.

Key takeaway

For independent evaluators or AI researchers assessing complex AI agent claims, you must demand full transparency regarding prompts, generated code, and execution logs. Google's \$916 OS claim, despite its cost disclosure, exemplifies how "single prompt" narratives can obscure extensive human effort and potential code regurgitation. Your rigorous analysis, including similarity checks, is essential to validate agent capabilities and establish credible "open-world evaluation" standards beyond vendor press releases.

Key insights

Google's AI agent OS claim lacks transparency, highlighting the need for rigorous "open-world evaluation" methodologies for complex AI tasks.

Principles

"Single prompt" claims can be misleading if prompt length and generation effort are undisclosed.
Open-world evaluations require new methodological norms for rigor.
Independent evaluators are crucial for credible AI vendor claims.

In practice

For evaluating AI agent claims, demand full disclosure of prompts, code, and logs.
Conduct similarity analysis to detect code copying in agent-generated software.

Topics

AI Agents
Open-World Evaluations
Prompt Engineering
Code Generation
Gemini 3.5 Flash
Antigravity 2.0

Best for: Research Scientist, AI Scientist, Director of AI/ML, Tech Journalist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI as Normal Technology.