Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

· Source: The Peterman Post · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Advanced, extended

Summary

Turing Award winner Mike Stonebraker, a pivotal figure in database technology, shares insights from his career, including the origins of Ingress and Postgres. He highlights Postgres's success due to its extendable type system, contrasting it with early competitors like Codd-O-Cil and IBM's IMS. Stonebraker advocates for a "one size fits none" database philosophy, arguing that specialized systems like column stores (Vertica, ClickHouse) or stream processors (StreamBase) significantly outperform general-purpose databases for specific workloads. He critiques Google's past reliance on MapReduce and eventual consistency, noting their eventual abandonment for systems like Spanner. His current work includes the DBOSS project, which reimagines operating system state management using database technology, and the "Beaver" benchmark, revealing that large language models achieve 0% accuracy on real-world text-to-SQL data warehouses due to schema complexity and idiosyncratic data.

Key takeaway

For Database Architects evaluating system choices or ML Engineers building agentic AI, Stonebraker's insights underscore the critical need for specialized database architectures over "one size fits all" solutions. Prioritize strong consistency and extendable type systems for robust applications. When tackling complex text-to-SQL, recognize LLM limitations on real-world data and consider transforming diverse data sources into tables for optimized querying, rather than relying on current LLM capabilities.

Key insights

Specialized database architectures and strong data integrity are crucial for performance and correctness, especially as applications become complex.

Principles

Method

For complex text-to-SQL, turn all data (SQL, CAD, text) into tables and use a query optimizer for joins, breaking queries into simpler pieces with explicit "FROM" and "JOIN" clauses.

In practice

Topics

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Peterman Post.