Answer synthesis in Foundry IQ: Quality metrics across 10,000 queries

2026-01-20 · Source: Microsoft Foundry Blog articles · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, long

Summary

Microsoft's answer synthesis feature, available in Foundry IQ and Azure AI Search, provides grounded, cited answers directly from a retrieval layer, simplifying RAG solution development. This feature generates natural language responses with inline citations and metadata, supporting applications like internal copilots and customer support bots. The system retrieves relevant content, synthesizes a response using an LLM (e.g., GPT-4.1-mini), and includes a references array. It supports steerability via natural language instructions and can generate partial answers even with incomplete retrieved content. Evaluation across Customer, Support, and Multi-industry, Multi-language (MIML) datasets, using over 10,000 queries, shows high performance in metrics like answer relevance (93.9% for MIML), groundedness (87.4% for MIML), and citation quality (81.6% for MIML). Performance varies across different GPT models, with less powerful models like gpt-4o-mini and gpt-4.1-nano showing significant drops.

Key takeaway

For AI Architects and NLP Engineers building RAG solutions, integrating answer synthesis via Foundry IQ or Azure AI Search can significantly improve answer quality and user experience. Your applications will benefit from automatically generated, cited, and steerable responses, reducing orchestration complexity. Be mindful of LLM choice, as less powerful models like gpt-4o-mini can notably impact performance metrics such as answer relevance and groundedness.

Key insights

Answer synthesis in Foundry IQ delivers grounded, cited responses, enhancing RAG applications with steerable and partially complete answers.

Principles

Prioritize user-provided instructions in LLM steering.
Generate partial answers over no answers for user utility.
Measure groundedness using atomic factual claims ("nuggets").

Method

The system retrieves relevant content, uses an LLM to synthesize a response with inline citations, and returns the answer with a references array. LLMs act as judges for quality metrics like relevance and groundedness.

In practice

Set "generateAnswer" parameter in agentic retrieval API.
Provide natural language instructions for answer steering.
Evaluate LLM performance across different models for RAG.

Topics

Answer Synthesis
Retrieval-Augmented Generation
LLM Evaluation Metrics
Azure AI Search
Foundry IQ

Best for: AI Architect, NLP Engineer, CTO, Machine Learning Engineer, AI Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.