Multimodal embeddings at scale: AI data lake for media and entertainment workloads
Summary
This post details how to build a scalable multimodal video search system using Amazon Nova models and Amazon OpenSearch Service, enabling natural language search across large video datasets. The solution processes 792,270 videos (8,480 hours) from AWS Open Data Registry datasets in 41 hours, costing $27,328 for the first year with on-demand OpenSearch. Key components include Amazon EC2 for compute, Amazon Bedrock Nova Multimodal Embeddings for generating 1024-dimensional audio-visual embeddings, and Nova Pro for adding 10-15 descriptive tags per video. Embeddings are stored in an OpenSearch k-NN index, while tags are in a separate text index. The system supports text-to-video, video-to-video, and hybrid search, with measured query latencies of approximately 76ms for semantic k-NN, 30ms for BM25 text, and 106ms for hybrid search at a scale of 792K videos.
Key takeaway
For AI Engineers building video content platforms, this solution offers a robust framework to move beyond keyword-based search. You should consider implementing this Amazon Bedrock and OpenSearch Service architecture to enable natural language and multimodal search, significantly enhancing content discoverability and user experience. Evaluate Nova 2 Lite for improved tagging accuracy and cost-effectiveness in new deployments.
Key insights
Build scalable multimodal video search using Amazon Nova embeddings and OpenSearch for semantic and hybrid queries.
Principles
- Combine vector and keyword search for accuracy.
- Optimize embedding dimensions for cost efficiency.
- Process video asynchronously for scalability.
Method
The method involves ingesting videos into S3, generating 1024-dimensional audio-visual embeddings and descriptive tags using Amazon Nova models, then indexing these into separate OpenSearch k-NN and text indexes for multimodal search capabilities.
In practice
- Use 1024-dimensional embeddings for 3x storage cost savings.
- Implement a job queue for Bedrock async API to manage concurrency.
- Tune hybrid search weights (e.g., 0.7 vector, 0.3 text) for relevance.
Topics
- Multimodal Video Search
- Amazon Nova
- Amazon OpenSearch Service
- Vector Embeddings
- Hybrid Search
Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.