Scaling Airbnb’s identity graph with a unified knowledge graph infrastructure
Summary
Airbnb has successfully transitioned its critical identity graph from a third-party Platform-as-a-Service (PaaS) solution to a new, internally managed unified knowledge graph infrastructure. Initiated in 2024, this shift addresses significant challenges like scaling 7 billion nodes and 11 billion edges, handling 5 million new edges daily, and complex 4-8 hop queries. The new platform utilizes JanusGraph with DynamoDB for storage and OpenSearch for indexing, offering storage separation and full control over graph logic. Key optimizations included custom transaction strategies, parallel query execution for high-fanout queries, and integrated distributed tracing. This migration resulted in substantial performance gains, including 32-93% lower Gremlin read-query latency, a 51% reduction in P95 read latency, and a 56% reduction in P95 write latency, alongside improved system stability and a 10x increase in write QPS during load tests.
Key takeaway
For AI Architects or MLOps Engineers managing large-scale graph data, consider building an internally managed graph infrastructure. You can achieve significant performance gains, including 50%+ latency reductions and 10x write QPS. Optimize open-source solutions like JanusGraph for this. This approach offers greater control, stability, and fine-tuning for complex query patterns, reducing vendor lock-in and operational toil.
Key insights
Building an internal, optimized graph infrastructure can significantly outperform third-party PaaS for large-scale, complex graph workloads.
Principles
- Storage separation enhances flexibility.
- Custom optimizations are crucial for scale.
- Client-side query rewriting boosts performance.
Method
Airbnb built a multi-tenant graph infrastructure using JanusGraph, DynamoDB, and OpenSearch, implementing custom transaction strategies, parallel query execution, and client-side Gremlin query rewriting for performance.
In practice
- Evaluate JanusGraph with DynamoDB for scale.
- Implement custom transaction strategies.
- Optimize Gremlin queries client-side.
Topics
- Knowledge Graphs
- Graph Databases
- JanusGraph
- DynamoDB
- Gremlin
- MLOps Infrastructure
- Trust and Safety
Best for: AI Engineer, MLOps Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Airbnb Tech Blog - Medium.