Netflix Serves 84% of Query Results from Cache with Interval-Aware Caching in Apache Druid
Summary
Netflix has implemented an interval-aware caching strategy for Apache Druid, significantly improving query efficiency for real-time analytics. This new approach serves approximately 84% of query results from cache, leading to a 33% reduction in query load on Druid and a 66% improvement in P90 query times. The system addresses the challenge of rolling window dashboards, where continuously refreshing queries with slightly shifting time ranges traditionally cause redundant computation. Instead of caching full query outputs, the system decomposes results into time-aligned segments, storing intermediate aggregates for fixed intervals. When a new query arrives, cached historical segments are reused, and only the most recent interval is recomputed and merged, drastically reducing data scans and processing for workloads involving over 10 trillion rows in Apache Druid.
Key takeaway
For MLOps Engineers managing large-scale real-time analytics platforms like Apache Druid, adopting an interval-aware caching strategy can dramatically reduce query load and improve performance. Consider implementing a proxy layer to decompose rolling window queries into time-aligned segments, reusing historical data while only recomputing recent intervals. This approach can significantly cut infrastructure costs and enhance dashboard responsiveness for your users.
Key insights
Decomposing query results into time-aligned segments enables high cache reuse for rolling window analytics.
Principles
- Cache intermediate aggregates, not full query outputs.
- Separate query structure from time intervals for cache keys.
- Use exponential TTLs for granularity-aligned buckets.
Method
Intercept queries, separate structure from time, generate reusable cache keys, store segments in a distributed key-value system, and merge recomputed recent data with cached historical segments.
In practice
- Implement an external proxy for caching.
- Reduce result bytes and segment scans by 14x.
- Improve P90 query times by 66%.
Topics
- Apache Druid
- Interval-Aware Caching
- Real-time Analytics
- Rolling Window Queries
- Query Optimization
Best for: AI Architect, MLOps Engineer, CTO, Data Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by InfoQ.