Improved data processing features in Foundry IQ: Richer content extraction and data enrichment

· Source: Microsoft Foundry Blog articles · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Cloud Computing & IT Infrastructure · Depth: Intermediate, short

Summary

The Microsoft Foundry IQ (Azure AI Search) 2026-05-01-preview release introduces significant enhancements to data processing for enterprise retrieval systems. Key updates include expanded SharePoint indexing to cover modern ASPX site pages and SharePoint Lists, alongside document libraries, with recursive subsite discovery. The platform now offers improved Content Understanding via Foundry Tools, enabling semantic chunking that respects document structure and AI-generated image descriptions to convert visual content into retrievable text, activated by setting contentExtractionMode to "standard". Additionally, image serving preserves and makes extracted images available at retrieval time, allowing models to reason over visual information. Other updates include Azure API Management endpoint support for Azure OpenAI skills and private connectivity for secure model communication. The 2026-04-01 REST API also made several capabilities generally available, such as the GenAI Prompt skill and enhanced Content Understanding.

Key takeaway

For AI Engineers and MLOps teams building enterprise RAG or agentic retrieval systems, you should immediately explore Foundry IQ's 2026-05-01-preview features. Utilizing expanded SharePoint indexing, semantic chunking, and image serving will significantly improve grounding and answer quality by preserving critical document structure and visual context. This reduces custom engineering effort and enhances the reliability of your retrieval systems.

Key insights

Improved data pipelines in Foundry IQ enhance RAG and agentic retrieval by preserving content structure and visual context.

Principles

Method

Enable Content Understanding by setting contentExtractionMode property to "standard" in file-based indexed knowledge sources (Azure blob, SharePoint, OneLake) for semantic chunking and AI-generated image descriptions.

In practice

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, MLOps Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.