Scaling Airbnb’s identity graph with a unified knowledge graph infrastructure

· Source: The Airbnb Tech Blog - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Cloud Computing & IT Infrastructure · Depth: Advanced, medium

Summary

Airbnb has successfully transitioned its critical identity graph from a third-party Platform-as-a-Service (PaaS) solution to a new, internally managed unified knowledge graph infrastructure. Initiated in 2024, this shift addresses significant challenges like scaling 7 billion nodes and 11 billion edges, handling 5 million new edges daily, and complex 4-8 hop queries. The new platform utilizes JanusGraph with DynamoDB for storage and OpenSearch for indexing, offering storage separation and full control over graph logic. Key optimizations included custom transaction strategies, parallel query execution for high-fanout queries, and integrated distributed tracing. This migration resulted in substantial performance gains, including 32-93% lower Gremlin read-query latency, a 51% reduction in P95 read latency, and a 56% reduction in P95 write latency, alongside improved system stability and a 10x increase in write QPS during load tests.

Key takeaway

For AI Architects or MLOps Engineers managing large-scale graph data, consider building an internally managed graph infrastructure. You can achieve significant performance gains, including 50%+ latency reductions and 10x write QPS. Optimize open-source solutions like JanusGraph for this. This approach offers greater control, stability, and fine-tuning for complex query patterns, reducing vendor lock-in and operational toil.

Key insights

Building an internal, optimized graph infrastructure can significantly outperform third-party PaaS for large-scale, complex graph workloads.

Principles

Method

Airbnb built a multi-tenant graph infrastructure using JanusGraph, DynamoDB, and OpenSearch, implementing custom transaction strategies, parallel query execution, and client-side Gremlin query rewriting for performance.

In practice

Topics

Best for: AI Engineer, MLOps Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Airbnb Tech Blog - Medium.