Ask a Techspert: How does AI understand my visual searches?
Summary
Google has significantly updated its Visual Search capabilities, including Circle to Search and Lens, to allow simultaneous searching for multiple objects within a single image. This advancement, detailed by Search Senior Engineering Director Dunia Barada and published on March 5, 2026, enables users to search for an entire outfit or multiple components of a room design at once, rather than item by item. The new AI mode, powered by advanced Gemini models, analyzes images and user questions to perform multi-object reasoning. It then employs a "fan-out" technique to trigger and synthesize results from multiple visual searches concurrently, presenting a cohesive response with relevant links in seconds. This functionality extends beyond image-initiated searches, allowing users to start with text queries and refine results by selecting specific elements from generated images.
Key takeaway
For Computer Vision Engineers developing search or recommendation systems, Google's "fan-out" technique for multi-object visual search offers a blueprint for enhancing user experience. You should explore implementing similar parallel processing and result synthesis to move beyond single-item identification, enabling more comprehensive scene understanding and richer, more relevant search outcomes for your users.
Key insights
Google's updated Visual Search uses Gemini models and a "fan-out" technique for simultaneous multi-object image analysis.
Principles
- Multimodal search enhances complex query understanding.
- AI models can perform multi-object reasoning.
- Fan-out technique enables simultaneous sub-searches.
Method
Gemini models analyze images and questions, perform multi-object reasoning, then use a "fan-out" technique to trigger multiple visual searches, synthesize results, and present a single cohesive response.
In practice
- Use Circle to Search for entire outfits.
- Start with text, then refine with image selections.
- Identify multiple plants for care requirements.
Topics
- Visual Search
- Multimodal AI
- Gemini Models
- Fan-out Technique
- Google Lens
Best for: Computer Vision Engineer, AI Product Manager, Software Engineer, Tech Journalist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI.