Pinterest cut AI costs 90% by gutting a frontier model's vision layer
Summary
Pinterest achieved a 90% reduction in AI costs and a 30% boost in accuracy for its visual recommendation system, which serves 620 million monthly users. CTO Matt Madrigal's team accomplished this by extensively customizing the open-source Qwen3-VL model. They "ripped out" Qwen's original vision encoder layer and replaced it with proprietary multimodal embeddings, fine-tuning the model on this unique data. This strategy enables precomputation of image metadata offline and continuous retraining, eliminating the need to encode each image at runtime, which previously resulted in 20 times worse inference latency. Additionally, Pinterest developed a "taste graph," a dynamic representation of individual user preferences, using graph structures and representational learning with constantly updated user embeddings to guide personalized visual discovery from inspiration to purchase.
Key takeaway
For AI Engineers or MLOps teams scaling visual AI systems, consider deeply customizing open-source foundation models. By replacing generic vision layers with your proprietary multimodal embeddings, you can significantly reduce inference costs and latency, as Pinterest did with Qwen3-VL. This approach allows for offline data precomputation and continuous retraining, directly improving accuracy and user engagement for high-volume applications.
Key insights
Customizing open-source models with proprietary data and embeddings drastically cuts costs and improves performance for large-scale visual AI.
Principles
- Data quality outweighs model size for unique use cases.
- Open-source models allow deep customization for specific needs.
- Precomputing embeddings offline improves runtime inference.
Method
Gut Qwen3-VL's vision encoder, replace with proprietary multimodal embeddings, fine-tune on unique data, and precompute metadata offline for visual discovery.
In practice
- Replace generic vision layers with custom embeddings.
- Develop a dynamic "taste graph" for user preferences.
- Benchmark continuously for engagement and performance.
Topics
- AI Cost Optimization
- Open-Source Model Customization
- Visual Discovery
- Multimodal Embeddings
- Qwen3-VL
- Taste Graph
Best for: CTO, AI Architect, VP of Engineering/Data, AI Engineer, MLOps Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.