In-Context Learning for the Imputation of Public Opinion Data with Large Language Models

2026-06-08 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

A new study proposes using in-context learning (ICL) with large language models (LLMs) for imputing missing public opinion data, addressing the common problem of partial non-response in surveys. The research systematically evaluates ICL design choices across various missingness mechanisms (MCAR, MAR, MNAR) using 150 opinion variables from 15 waves of the American Trends Panel. The ICL approach consistently reduces absolute error compared to established statistical methods like MICE PMM, showing the largest gains under non-random missingness (MNAR). Specifically, the best-performing configuration, gpt-oss-120b with 100 in-context examples, achieves near-nominal aggregate coverage (approaching the 95% level) and confidence intervals two to five times narrower than MICE PMM. A Python package with an sklearn-like API is also released for easy deployment.

Key takeaway

For data scientists and survey researchers tasked with imputing missing values in public opinion datasets, you should consider integrating in-context learning (ICL) with LLMs. This approach offers superior accuracy, particularly for non-random missingness, and provides significantly narrower confidence intervals than MICE PMM. Leverage the released Python package to streamline the deployment of this method, enhancing the reliability and precision of your survey data analysis.

Key insights

In-context learning with LLMs significantly improves missing public opinion data imputation over traditional statistical methods.

Principles

Imputation fundamentally differs from prediction.
ICL can reduce imputation error, especially for non-random missingness.
Systematic evaluation of ICL design choices is crucial.

Method

Missing survey data is imputed through in-context learning, systematically evaluating ICL design choices across MCAR, MAR, and MNAR missingness mechanisms.

In practice

Use gpt-oss-120b with 100 in-context examples for optimal imputation.
Deploy the method via the provided Python package with an sklearn-like API.

Topics

In-Context Learning
Data Imputation
Large Language Models
Public Opinion Data
Missing Data Mechanisms
Python Package

Best for: AI Scientist, Data Scientist, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.