DeepMind’s New AI Found A Strange New Way To Think
Summary
DeepMind's AlphaProof Nexus AI successfully solved nine of 350 open mathematical problems by Paul Erdős, which had remained unsolved for decades, achieving this at a cost of "a couple hundred dollars per problem" despite a 95.7% failure rate. The system leverages Lean, a formalized mathematical language, where a mathematician outlines a problem and solution, leaving the proof blank. An AI agent attempts the proof, and a separate AI checks it, providing feedback. Crucially, a "cheaper judge AI" evaluates two previous solutions, selecting the "better" one, even if both are incorrect. This iterative process, akin to an ELO-scored tournament for proofs, continuously refines solutions from the highest-scoring "bad" attempts until a formal proof is validated. This approach demonstrates how reliable systems can emerge from unreliable AI components through effective "harnessing" and iterative loops, though it was tested on a subset of problems easier to formalize, and smaller models yielded no solutions.
Key takeaway
For AI Scientists and Research Scientists building reliable systems for complex problem-solving, your focus should shift from solely making models "smarter" to designing robust operational loops and judging mechanisms. You should experiment with iterative refinement processes, where even unreliable AI outputs are progressively improved through comparative evaluation and formal verification. This approach allows you to build highly reliable systems from less reliable components, but be aware that smaller models may still prove ineffective for truly hard problems.
Key insights
Reliable AI systems emerge from unreliable components via iterative refinement and robust judging loops.
Principles
- Intelligence can reside in the AI's operational loop.
- Iterative refinement with feedback improves unreliable AI outputs.
- Formalized languages enable verifiable AI-generated proofs.
Method
A mathematician formalizes a problem in Lean; an AI attempts the proof. A checker AI provides feedback, and a judge AI selects "better" solutions in an ELO-scored tournament, iterating until validation.
In practice
- Employ formal languages like Lean for verifiable AI outputs.
- Design iterative refinement loops with comparative judging.
- Prioritize larger AI models for complex, unsolved problems.
Topics
- AlphaProof Nexus
- Mathematical Proofs
- AI System Reliability
- Iterative AI Refinement
- Formal Verification
- AI Judging Systems
Best for: AI Scientist, Research Scientist, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Two Minute Papers.