Answer synthesis in Foundry IQ: Quality metrics across 10,000 queries
Summary
Microsoft's answer synthesis feature, available in Foundry IQ and Azure AI Search, provides grounded, cited answers directly from a retrieval layer, simplifying RAG solution development. This feature generates natural language responses with inline citations and metadata, supporting applications like internal copilots and customer support bots. The system retrieves relevant content, synthesizes a response using an LLM (e.g., GPT-4.1-mini), and includes a references array. It supports steerability via natural language instructions and can generate partial answers even with incomplete retrieved content. Evaluation across Customer, Support, and Multi-industry, Multi-language (MIML) datasets, using over 10,000 queries, shows high performance in metrics like answer relevance (93.9% for MIML), groundedness (87.4% for MIML), and citation quality (81.6% for MIML). Performance varies across different GPT models, with less powerful models like gpt-4o-mini and gpt-4.1-nano showing significant drops.
Key takeaway
For AI Architects and NLP Engineers building RAG solutions, integrating answer synthesis via Foundry IQ or Azure AI Search can significantly improve answer quality and user experience. Your applications will benefit from automatically generated, cited, and steerable responses, reducing orchestration complexity. Be mindful of LLM choice, as less powerful models like gpt-4o-mini can notably impact performance metrics such as answer relevance and groundedness.
Key insights
Answer synthesis in Foundry IQ delivers grounded, cited responses, enhancing RAG applications with steerable and partially complete answers.
Principles
- Prioritize user-provided instructions in LLM steering.
- Generate partial answers over no answers for user utility.
- Measure groundedness using atomic factual claims ("nuggets").
Method
The system retrieves relevant content, uses an LLM to synthesize a response with inline citations, and returns the answer with a references array. LLMs act as judges for quality metrics like relevance and groundedness.
In practice
- Set "generateAnswer" parameter in agentic retrieval API.
- Provide natural language instructions for answer steering.
- Evaluate LLM performance across different models for RAG.
Topics
- Answer Synthesis
- Retrieval-Augmented Generation
- LLM Evaluation Metrics
- Azure AI Search
- Foundry IQ
Best for: AI Architect, NLP Engineer, CTO, Machine Learning Engineer, AI Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.