X+Slides: Benchmarking Audience-Conditioned Slide Generation

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

X+Slides is a new benchmark designed to evaluate audience-conditioned slide generation from source documents, addressing a gap in existing benchmarks that overlook target audience as a critical factor. Built on a diverse corpus covering 113 topics and seven presentation scenes, X+Slides employs a dynamic evaluation framework using 8,133 deduplicated, source-grounded probes. It reports four complementary metrics: Audience Coverage, Domain-wise Coverage, Efficiency, and Correctness, by assigning audience-specific utility weights. Experiments with DeepPresenter, SlideTailor, and NotebookLM show current systems recover substantial but incomplete audience-essential information. At τ_A=0.7, DeepPresenter achieved 0.714 Audience Coverage, SlideTailor 0.594, and NotebookLM 0.853, highlighting clear grounding differences and the need for source-grounded evaluation over visual quality or broad topic coverage.

Key takeaway

For NLP Engineers and Research Scientists developing large language model-based slide generation systems, you should integrate audience-conditioned evaluation into your development and testing workflows. Move beyond generic completeness metrics by incorporating audience-specific utility weights and rigorous source-grounding checks, as demonstrated by X+Slides. This approach will help you build systems that deliver truly relevant and accurate information, ensuring your generated slides meet the specific needs of diverse professional audiences.

Key insights

Benchmarking LLM-generated slides requires audience-specific utility and source grounding, not just completeness or visual quality.

Principles

Method

X+Slides uses a dynamic evaluation framework with 8,133 source-grounded probes, applying audience-specific utility weights to measure Audience Coverage, Domain-wise Coverage, Efficiency, and Correctness.

In practice

Topics

Best for: AI Scientist, NLP Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.