DeepMind’s New AI Found A Strange New Way To Think

· Source: Two Minute Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Intermediate, medium

Summary

DeepMind's AlphaProof Nexus AI successfully solved nine of 350 open mathematical problems by Paul Erdős, which had remained unsolved for decades, achieving this at a cost of "a couple hundred dollars per problem" despite a 95.7% failure rate. The system leverages Lean, a formalized mathematical language, where a mathematician outlines a problem and solution, leaving the proof blank. An AI agent attempts the proof, and a separate AI checks it, providing feedback. Crucially, a "cheaper judge AI" evaluates two previous solutions, selecting the "better" one, even if both are incorrect. This iterative process, akin to an ELO-scored tournament for proofs, continuously refines solutions from the highest-scoring "bad" attempts until a formal proof is validated. This approach demonstrates how reliable systems can emerge from unreliable AI components through effective "harnessing" and iterative loops, though it was tested on a subset of problems easier to formalize, and smaller models yielded no solutions.

Key takeaway

For AI Scientists and Research Scientists building reliable systems for complex problem-solving, your focus should shift from solely making models "smarter" to designing robust operational loops and judging mechanisms. You should experiment with iterative refinement processes, where even unreliable AI outputs are progressively improved through comparative evaluation and formal verification. This approach allows you to build highly reliable systems from less reliable components, but be aware that smaller models may still prove ineffective for truly hard problems.

Key insights

Reliable AI systems emerge from unreliable components via iterative refinement and robust judging loops.

Principles

Method

A mathematician formalizes a problem in Lean; an AI attempts the proof. A checker AI provides feedback, and a judge AI selects "better" solutions in an ELO-scored tournament, iterating until validation.

In practice

Topics

Best for: AI Scientist, Research Scientist, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Two Minute Papers.