What’s the Best Way to Brainwash an LLM?

2026-05-13 · Source: Towards Data Science · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Advanced, long

Summary

An experiment investigated three Supervised Fine-Tuning (SFT) data formats for imbuing a small language model, Qwen3-4B-Instruct, with a persistent C-3PO persona without requiring a system prompt. The three formats tested were conversational demonstrations, first-person statements, and Wikipedia-style synthetic documents (SDF), each with 500 training examples generated by Claude. Fine-tuning was performed using LoRA (r=16) on a single GPU. Evaluation involved perplexity on held-out text and human-readable trait tagging of 30 model responses. The first-person statements model achieved the lowest perplexity on its own format (4.5) and generalized well to synthetic documents (5.4), demonstrating a deeper encoding of the persona. While all methods achieved surface-level fidelity, the first-person approach most effectively conveyed C-3PO's anxiety and overall emotional texture, outperforming demonstrations and SDF in trait coverage.

Key takeaway

For AI Engineers aiming to imbue LLMs with robust, default personas, prioritize training data composed of first-person statements. This approach fosters deeper persona internalization and better generalization across varied interaction contexts than conversational demonstrations or factual descriptions. While demonstrations suit fixed deployment scenarios, and synthetic documents excel at factual accuracy, first-person data best captures the emotional and behavioral nuances of a character, reducing reliance on explicit system prompts.

Key insights

First-person statements are most effective for deeply embedding a persona in LLMs via SFT, enabling better generalization.

Principles

First-person statements update self-representation.
Demonstrations update behavioral patterns.
Synthetic documents update world knowledge.

Method

Fine-tune a small LLM (Qwen3-4B-Instruct) with LoRA (r=16) using 500 examples across three data formats: demonstrations, first-person statements, and synthetic documents.

In practice

Use first-person statements for robust persona generalization.
Combine SDF with first-person for factual grounding and identity.
System prompts remain effective for many use cases.

Topics

Supervised Fine-Tuning
Persona Injection
Language Model Personalization
LoRA Fine-tuning
C-3PO Persona

Code references

feal-ca/C3PO-persona

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.