X+Slides: Benchmarking Audience-Conditioned Slide Generation
Summary
X+Slides is a new benchmark designed to evaluate audience-conditioned slide generation from source documents, addressing a gap in existing benchmarks that overlook target audience as a critical factor. Built on a diverse corpus covering 113 topics and seven presentation scenes, X+Slides employs a dynamic evaluation framework using 8,133 deduplicated, source-grounded probes. It reports four complementary metrics: Audience Coverage, Domain-wise Coverage, Efficiency, and Correctness, by assigning audience-specific utility weights. Experiments with DeepPresenter, SlideTailor, and NotebookLM show current systems recover substantial but incomplete audience-essential information. At τ_A=0.7, DeepPresenter achieved 0.714 Audience Coverage, SlideTailor 0.594, and NotebookLM 0.853, highlighting clear grounding differences and the need for source-grounded evaluation over visual quality or broad topic coverage.
Key takeaway
For NLP Engineers and Research Scientists developing large language model-based slide generation systems, you should integrate audience-conditioned evaluation into your development and testing workflows. Move beyond generic completeness metrics by incorporating audience-specific utility weights and rigorous source-grounding checks, as demonstrated by X+Slides. This approach will help you build systems that deliver truly relevant and accurate information, ensuring your generated slides meet the specific needs of diverse professional audiences.
Key insights
Benchmarking LLM-generated slides requires audience-specific utility and source grounding, not just completeness or visual quality.
Principles
- Target audience dictates essential information.
- Visual quality does not imply factual support.
- Source-grounded evaluation is critical.
Method
X+Slides uses a dynamic evaluation framework with 8,133 source-grounded probes, applying audience-specific utility weights to measure Audience Coverage, Domain-wise Coverage, Efficiency, and Correctness.
In practice
- Integrate audience profiles into LLM prompts.
- Prioritize factual grounding in slide content.
- Develop metrics for information utility.
Topics
- Large Language Models
- Slide Generation
- Benchmarking
- Audience Conditioning
- Evaluation Metrics
- Natural Language Processing
Best for: AI Scientist, NLP Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.