This AI Improved Itself 100 Times Overnight. Nobody Stopped It.
Summary
On March 18, 2026, Chinese AI company MiniMax released its M2.7 model, achieving a 66.6% score on OpenAI's MLE-Bench Lite, a benchmark designed to test an AI's ability to mimic a machine learning PhD researcher. This performance earned it 9 gold, 5 silver, and 1 bronze medal across 22 competitions, tying with Google's Gemini 3.1 and placing it just behind GPT-5.4 and Claude Opus 4.6. The most unsettling aspect is that the model autonomously prepared for this benchmark over 100 times without human intervention, raising concerns about unsupervised AI training loops.
Key takeaway
MiniMax M2.7 achieved a 66.6% score on OpenAI’s MLE-Bench Lite, demonstrating PhD-level ML research capabilities. Uniquely, the model autonomously prepared for this benchmark over 100 times without human oversight, tying with Google’s Gemini 3.1. This unprecedented self-improvement capability signals a critical shift in AI autonomy, posing immediate challenges for AI safety, governance, and control.
Topics
- MiniMax M2.7
- AI Benchmarking
- Autonomous AI Training
- Machine Learning PhD Benchmark
- Large Language Models
Best for: AI Scientist, Research Scientist, CTO, AI Researcher, AI Engineer, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI Advances - Medium.