Standard Intelligence: Training General Intelligence in Pixel Space
Summary
Standard Intelligence, a startup founded by Galen Mead and Devansh Pandey, is pursuing a contrarian approach to general computer agents by focusing on full video pre-training from raw computer use. Published on April 30, 2026, their thesis posits that scaling raw video data, rather than text or screenshots, is the most promising path to truly scalable action data for agents. The company's model learns to predict mouse movements, clicks, and keystrokes directly from screen pixels, akin to Tesla FSD for knowledge work. Despite video's computational expense, Standard Intelligence has achieved significant breakthroughs, including an 11-million-hour computer action dataset, a video encoder 50x more token-efficient, and a 30-petabyte storage cluster built for under \$500K. Their first foundation model, FDM-1, demonstrates capabilities like extruding CAD gears in Blender and finding software bugs. Sequoia Capital led their Series A funding.
Key takeaway
For AI Engineers evaluating foundational model paradigms, Standard Intelligence's video-first pre-training presents a compelling alternative to language-centric approaches. You should consider how raw pixel-based learning could enable more generalizable agents for complex computer tasks. This shift suggests exploring video data pipelines and efficient video encoders for future agent development, potentially yielding agents capable of nuanced interaction beyond text commands.
Key insights
General computer agents can emerge from aggressively scaled raw video pre-training of computer use.
Principles
- Scaling raw video data enables truly generalizable agent actions.
- First principles reasoning can overcome established domain challenges.
- Aggressive data scaling fosters emergent generality in AI models.
Method
The model predicts subsequent mouse movements, clicks, and keystrokes directly from raw screen pixel data, learning computer use.
In practice
- Extrude CAD gears in Blender using FDM-1.
- Fine-tune FDM-1 for driving tasks in one hour.
- Utilize FDM-1 to explore software state spaces for bug detection.
Topics
- Video Pre-training
- General Agents
- FDM-1
- Computer Use Automation
- Data Scaling
- AI Safety
Best for: Research Scientist, Computer Vision Engineer, AI Scientist, AI Engineer, Investor
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Sequoia Capital.