Deepmind's research AI occasionally solves what humans can't and mostly gets everything else wrong

· Source: The Decoder · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Research Methodology & Innovation · Depth: Advanced, medium

Summary

Google Deepmind's AI agent Aletheia, built on Gemini Deep Think, has demonstrated significant capabilities in mathematical research, including independently writing a math paper, disproving a decade-old conjecture, and identifying a cryptography error. However, a systematic evaluation across 700 open math problems revealed that only 6.5 percent of its answers were genuinely useful, with many being fundamentally wrong or trivializing the question. The system employs a three-agent architecture for proposing, checking, and revising solutions, and uses web browsing to verify sources, though it still misrepresents cited content. Deepmind also provides guidelines for scientists to effectively collaborate with AI, treating it as a capable but error-prone junior researcher, and proposes a rating system to classify AI involvement and scientific significance.

Key takeaway

For AI Researchers evaluating the utility of advanced AI agents in scientific discovery, you should adopt a collaborative framework that treats the AI as a junior researcher. Focus on breaking down complex problems into manageable, verifiable sub-tasks and employ balanced prompting to mitigate the AI's tendency to "specification game." Your ability to verify AI outputs remains critical, as the model's confidence does not guarantee correctness, and a peer review crisis could emerge from accelerated paper production.

Key insights

AI can achieve breakthroughs in specific research tasks but requires careful human oversight and structured collaboration.

Principles

Method

Aletheia uses a three-agent system (proposer, checker, reviser) in a loop, augmented with web browsing for source verification, and can admit when it cannot solve a problem.

In practice

Topics

Code references

Best for: AI Researcher, AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.