Pinterest cut AI costs 90% by gutting a frontier model's vision layer

· Source: VentureBeat · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, medium

Summary

Pinterest achieved a 90% reduction in AI costs and a 30% boost in accuracy for its visual recommendation system, which serves 620 million monthly users. CTO Matt Madrigal's team accomplished this by extensively customizing the open-source Qwen3-VL model. They "ripped out" Qwen's original vision encoder layer and replaced it with proprietary multimodal embeddings, fine-tuning the model on this unique data. This strategy enables precomputation of image metadata offline and continuous retraining, eliminating the need to encode each image at runtime, which previously resulted in 20 times worse inference latency. Additionally, Pinterest developed a "taste graph," a dynamic representation of individual user preferences, using graph structures and representational learning with constantly updated user embeddings to guide personalized visual discovery from inspiration to purchase.

Key takeaway

For AI Engineers or MLOps teams scaling visual AI systems, consider deeply customizing open-source foundation models. By replacing generic vision layers with your proprietary multimodal embeddings, you can significantly reduce inference costs and latency, as Pinterest did with Qwen3-VL. This approach allows for offline data precomputation and continuous retraining, directly improving accuracy and user engagement for high-volume applications.

Key insights

Customizing open-source models with proprietary data and embeddings drastically cuts costs and improves performance for large-scale visual AI.

Principles

Method

Gut Qwen3-VL's vision encoder, replace with proprietary multimodal embeddings, fine-tune on unique data, and precompute metadata offline for visual discovery.

In practice

Topics

Best for: CTO, AI Architect, VP of Engineering/Data, AI Engineer, MLOps Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.