Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer
Summary
Simon Hørup Eskildsen founded Turbopuffer after realizing the prohibitive cost of vector search for article recommendations at Readwise, aiming to build a cost-effective "search engine for unstructured data" that connects AI models to full-fidelity knowledge. The company's architecture leverages recent cloud primitives like S3 strong consistency, NVMe SSDs, and object storage compare-and-swap, enabling a simple, serverless design without traditional consensus layers. Turbopuffer successfully reduced costs by 95% for early customers like Cursor and met stringent latency demands for Notion by prioritizing architectural conviction, even buying dark fiber. In the age of AI, Turbopuffer addresses the "build vs. buy" dilemma by offering rapid deployment and adapting to agentic workloads that drive highly concurrent, parallel queries, leading to a 5x reduction in query pricing. The future roadmap includes expanding state-of-the-art full-text search features, scaling to common crawl-level datasets with advanced ANN algorithms, and eventually supporting a broader range of query plans based on customer needs.
Key takeaway
Turbopuffer delivers a highly scalable and cost-efficient search engine for unstructured data, achieving 95% cost reductions for customers like Cursor by leveraging S3 strong consistency and NVMe SSDs without a traditional consensus layer. Its ANNv3 searches 100 billion vectors with P50 40ms, and its full-text engine outperforms Lucene on web-scale, LLM-generated queries. This enables AI/ML professionals to deploy agentic applications requiring massive, concurrent retrieval against vast datasets at significantly lower operational overhead.
Topics
- Vector Search
- Cloud Database Architecture
- Full-Text Search
- AI Agent Workloads
- Cost Optimization
Best for: CTO, VP of Engineering/Data, Entrepreneur, Software Engineer, Data Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Latent Space: The AI Engineer Podcast.