How MakeMyTrip Achieved Millisecond Personalization at Scale with Databricks
Summary
MakeMyTrip, India's largest online travel agency, has implemented Databricks' Real-Time Mode (RTM) in Apache Spark Structured Streaming to achieve millisecond-level latency for its "last-searched" hotels feature. This feature provides real-time, personalized recommendations to millions of daily users across consumer and corporate travel. Previously, Apache Spark's micro-batch mode delivered latencies of one to two seconds, which was too slow. While Apache Flink met latency requirements, MakeMyTrip avoided a dual-engine architecture due to concerns about architectural fragmentation, duplicated business logic, higher operational overhead, consistency risks, and increased infrastructure costs. RTM's innovations, including continuous data flow, pipeline scheduling, and streaming shuffle, enable Spark to process data as it arrives, eliminating micro-batch latency bottlenecks and allowing MakeMyTrip to maintain a single, cost-effective Spark-based pipeline.
Key takeaway
For AI Architects and Data Engineers designing real-time personalization systems, consider adopting Apache Spark's Real-Time Mode (RTM) to achieve millisecond-level latencies. This approach allows you to avoid the architectural complexity and operational overhead of a dual-engine setup (e.g., Spark + Flink) while maintaining a unified, cost-effective data pipeline. Evaluate RTM for use cases requiring continuous data flow and immediate processing to enhance user experience and reduce friction.
Key insights
Achieving millisecond-level real-time personalization at scale is possible with Apache Spark's Real-Time Mode.
Principles
- Avoid architectural fragmentation.
- Unified data processing reduces complexity.
- Continuous data flow minimizes latency.
Method
MakeMyTrip's architecture merges B2C/B2B clickstream topics, applies personalization logic via RTM, performs low-latency stateful lookups in Aerospike, and pushes results to Redis for sub-50ms serving.
In practice
- Use RTM for sub-second Spark streaming.
- Consolidate batch and real-time engines.
- Employ Aerospike for stateful lookups.
Topics
- MakeMyTrip
- Real-Time Personalization
- Databricks Real-Time Mode
- Apache Spark Structured Streaming
- Low-Latency Data Processing
Best for: Data Engineer, AI Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Databricks.