Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker
Summary
Turing Award winner Mike Stonebraker, a pivotal figure in database technology, shares insights from his career, including the origins of Ingress and Postgres. He highlights Postgres's success due to its extendable type system, contrasting it with early competitors like Codd-O-Cil and IBM's IMS. Stonebraker advocates for a "one size fits none" database philosophy, arguing that specialized systems like column stores (Vertica, ClickHouse) or stream processors (StreamBase) significantly outperform general-purpose databases for specific workloads. He critiques Google's past reliance on MapReduce and eventual consistency, noting their eventual abandonment for systems like Spanner. His current work includes the DBOSS project, which reimagines operating system state management using database technology, and the "Beaver" benchmark, revealing that large language models achieve 0% accuracy on real-world text-to-SQL data warehouses due to schema complexity and idiosyncratic data.
Key takeaway
For Database Architects evaluating system choices or ML Engineers building agentic AI, Stonebraker's insights underscore the critical need for specialized database architectures over "one size fits all" solutions. Prioritize strong consistency and extendable type systems for robust applications. When tackling complex text-to-SQL, recognize LLM limitations on real-world data and consider transforming diverse data sources into tables for optimized querying, rather than relying on current LLM capabilities.
Key insights
Specialized database architectures and strong data integrity are crucial for performance and correctness, especially as applications become complex.
Principles
- Database systems must be architected for specific workloads to achieve optimal performance.
- Eventual consistency is generally unsuitable for enterprise applications requiring data integrity.
- Extendable type systems are essential for supporting diverse and evolving data models.
Method
For complex text-to-SQL, turn all data (SQL, CAD, text) into tables and use a query optimizer for joins, breaking queries into simpler pieces with explicit "FROM" and "JOIN" clauses.
In practice
- Use Postgres for general-purpose, low-end database needs due to its community and flexibility.
- Consider specialized database systems (e.g., column stores) for high-end, specific workloads.
- Employ DBOSS's transactional workflows for agentic AI applications requiring read-write atomicity.
Topics
- Database Systems
- Postgres
- Query Optimization
- Text-to-SQL
- Large Language Models
- Agentic AI
- DBOSS
Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Peterman Post.