Agoda Builds Multimodal Content System to Bridge Images and Reviews in Travel Discovery
Summary
Agoda has developed a multimodal content system that integrates hotel images and guest reviews into a unified, topic-based structure. Launched on May 19, 2026, this system aims to provide users with a consistent understanding of hotel attributes by correlating visual content with written feedback. Operating at a massive scale, it processes over 700 million images and multilingual reviews in more than 40 languages. The core redesign introduces a shared topic taxonomy, such as "Pool" or "Breakfast," replacing fragmented processing pipelines. Images are analyzed via classification models, while reviews use NLP pipelines, both mapping to these canonical topics. This approach pre-aggregates multimodal data, storing topic-level artifacts in Couchbase for low-latency serving, orchestrated by PySpark jobs managed with Kubeflow.
Key takeaway
For AI Architects designing large-scale content systems, Agoda's multimodal approach offers a blueprint for unifying disparate data. You should consider implementing a shared topic taxonomy to semantically link visual and textual content, enhancing consistency and user experience. Pre-aggregating multimodal data offline, served via a low-latency layer like Couchbase, can significantly boost performance. However, this strategy requires robust topic governance to prevent semantic drift.
Key insights
Agoda unifies hotel images and guest reviews into a shared topic taxonomy for consistent attribute interpretation.
Principles
- Unify content modalities via a shared topic taxonomy.
- Pre-aggregate multimodal data for low-latency retrieval.
- Multilingual normalization ensures global content consistency.
Method
Images are classified and reviews processed via NLP, both mapped to a shared topic taxonomy. Multimodal topic artifacts are pre-aggregated offline and served from Couchbase, orchestrated by PySpark and Kubeflow.
In practice
- Correlate visual content with written feedback.
- Precompute multimodal associations to avoid runtime joins.
- Integrate new content sources into the topic framework.
Topics
- Multimodal Content
- Travel Tech
- Image Classification
- Natural Language Processing
- Shared Topic Taxonomy
- Data Orchestration
Best for: NLP Engineer, Computer Vision Engineer, AI Engineer, Machine Learning Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by InfoQ.