Vibe Coding with Kev: I Built an Outfit Recommender and All I Got Was Combat Boots With a Sundress
Summary
An attempt to build an AI-powered outfit recommender system, expanding on a previous visual search project, encountered significant challenges despite individual components functioning correctly. The system aimed to identify individual garments from editorial lookbook photos using Meta's Segment Anything Model (SAM) and then match them to a product catalog using FashionCLIP for outfit recommendations. While SAM effectively segmented thousands of high-quality items from various brand lookbooks and FashionCLIP accurately matched similar products, the overall recommendation engine failed to generate coherent outfits. The core issue stemmed from the system's inability to distinguish between items merely co-occurring in a photo and items that are stylistically compatible, leading to illogical pairings like sundresses with combat boots. The project highlighted that domain knowledge, such as styling rules and occasion context, is crucial and cannot be derived solely from visual embeddings.
Key takeaway
For Machine Learning Engineers building recommendation systems, recognize that purely visual or co-occurrence-based approaches often fall short for tasks requiring nuanced domain knowledge. Your models may perform perfectly on individual components, but integrating human expertise or structured domain rules is critical to bridge the gap between technical functionality and practical, useful output. Consider how to encode "what goes with what" beyond simple visual similarity.
Key insights
Visual embeddings alone cannot capture complex domain knowledge like fashion compatibility for outfit recommendations.
Principles
- Co-occurrence does not imply compatibility.
- Domain knowledge is essential for practical AI applications.
Method
The system used Meta's SAM for image segmentation and FashionCLIP for visual matching, attempting to generate outfit recommendations by pairing segmented items from lookbook images with catalog products.
In practice
- Use SAM for high-quality image segmentation.
- Apply FashionCLIP for visual similarity search.
Topics
- Outfit Recommender Systems
- Visual Search
- Segment Anything Model
- FashionCLIP
- AI Domain Expertise
Best for: Machine Learning Engineer, Data Scientist, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Data Science on Medium.