How Airtable Built the Search Layer Behind Their AI Features
Summary
Airtable developed a semantic search layer for its AI features, Omni and linked record recommendations, enabling natural language queries over customer data. The architecture, built on Milvus, was primarily shaped by the fact that 75% of customer databases are idle weekly. Key design priorities included 500-millisecond query latency at the 99th percentile, high-throughput writes, horizontal scalability for millions of isolated bases, and self-hosting. They adopted a "one partition per base" strategy within Milvus, overcoming performance issues with hierarchical capping (400 collections, each with 1,000 partitions). For vector indexing, HNSW was selected for its high recall and predictable performance. This memory cost was mitigated by offloading 75% of idle "cold" partitions from memory to storage, leveraging Milvus's capabilities. Recovery involves re-embedding data from source using an existing asynchronous pipeline.
Key takeaway
For AI Architects designing vector search systems for multi-tenant applications, prioritize understanding your specific data access patterns and isolation needs over generic algorithm choices. Your system's performance and cost efficiency will hinge on how you manage idle data and partition strategy. Consider hierarchical capping for scalability and leveraging existing data pipelines for robust recovery, ensuring your design aligns with real-world usage.
Key insights
Data access patterns, not algorithms, dictate optimal system architecture for vector search.
Principles
- Distributed systems often require hierarchical capping for flat namespaces.
- Vector search involves a tradeoff between memory, latency, and recall.
- Measure actual usage patterns before optimizing for cost.
Method
Airtable implemented a hierarchical capping strategy: Milvus clusters hold 400 collections, each with up to 1,000 partitions, ensuring predictable performance for millions of isolated customer bases.
In practice
- Use HNSW for high recall and low latency vector search.
- Offload idle partitions to storage to reduce memory costs.
- Design recovery around existing data pipelines.
Topics
- Vector Databases
- Milvus
- Semantic Search
- Multi-tenancy
- HNSW Index
- Data Partitioning
- AI Architecture
Best for: AI Architect, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by ByteByteGo Newsletter.