Evaluation Drift in LLM Personality Induction: Are We Moving the Goalpost?

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A study investigated whether large language models (LLMs) can reliably express human-like personality, focusing on the Big Five framework. Researchers fine-tuned five models (LLaMA 3.2–3B, Gemma-2–2B, Gemma–7B, LLaMA 3.1–8B, and GPT–3.5) on a dataset of 2,467 long-form essays, each associated with a target Big Five personality profile. The stability and fidelity of induced personality were evaluated using the IPIP-NEO questionnaire. The findings indicate that fine-tuning, including supervised fine-tuning (SFT) and preferential fine-tuning (DPO, ORPO), consistently reduces variance in questionnaire responses by 15% to 33% across all models, mitigating evaluation fragility observed in pre-trained models. However, despite this improved stability and some gains in single-trait scores, the accuracy for the full five-dimensional personality profile remained near chance (maximum 9.38% for 32 combinations, compared to a 3.125% random baseline). This suggests that unguided essays lack sufficient cues for faithful personality expression, and security alignment does not significantly impact induction efficacy.

Key takeaway

For AI scientists and machine learning engineers developing personality-infused LLMs, recognize that while fine-tuning enhances response consistency, current methods using unguided text are insufficient for accurate, multi-dimensional personality induction. You should prioritize developing scenario-grounded datasets or interactive elicitation techniques that accumulate test-aligned evidence over time to achieve faithful personality expression, rather than relying on single-trait improvements.

Key insights

Fine-tuning LLMs improves personality questionnaire response stability but fails to induce accurate, multi-dimensional Big Five profiles from unguided text.

Principles

Method

LLMs were fine-tuned on essays with Big Five labels, then evaluated for personality stability and accuracy using the IPIP-NEO questionnaire under prompt rephrasing and various post-training methods (SFT, DPO, ORPO).

In practice

Topics

Code references

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.