CoPA: Benchmarking Personalized Question Answering with Data-Informed Cognitive Factors
Summary
A new benchmark called CoPA has been introduced to evaluate personalized Question Answering (QA) capabilities of Large Language Models (LLMs). This benchmark addresses the limitations of existing evaluation paradigms, which often rely on lexical similarity or manual heuristics without sufficient data-driven validation. CoPA distills six key personalization factors by mining Community-Individual Preference Divergence (CIPD), where individual choices override consensus. It includes 1,985 user profiles for fine-grained, factor-level assessment. By quantifying the alignment between model outputs and user-specific cognitive preferences inferred from interaction patterns, CoPA offers a more comprehensive and discriminative standard for evaluating personalized QA compared to generic metrics. The code for CoPA is publicly available on GitHub.
Key takeaway
For research scientists developing or deploying LLMs for personalized Question Answering, CoPA provides a robust, data-driven benchmark to assess model performance beyond generic metrics. You should integrate CoPA into your evaluation pipeline to gain fine-grained, factor-level insights into how well your models align with individual user cognitive preferences, thereby improving personalization accuracy.
Key insights
CoPA benchmarks personalized QA by aligning LLM outputs with user-specific cognitive preferences derived from individual data.
Principles
- Individual preferences often diverge from community consensus.
- Personalization requires data-driven cognitive factor evaluation.
Method
CoPA mines Community-Individual Preference Divergence (CIPD) to distill six personalization factors. It then quantifies alignment between model outputs and user-specific cognitive preferences inferred from interaction patterns.
In practice
- Use CoPA to evaluate LLMs for personalized QA.
- Analyze CIPD to identify user-specific preferences.
Topics
- Personalized Question Answering
- CoPA Benchmark
- Community-Individual Preference Divergence
- Cognitive Factors
- Large Language Models
Code references
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.