What’s the Best Way to Brainwash an LLM?
Summary
An experiment investigated three Supervised Fine-Tuning (SFT) data formats for imbuing a small language model, Qwen3-4B-Instruct, with a persistent C-3PO persona without requiring a system prompt. The three formats tested were conversational demonstrations, first-person statements, and Wikipedia-style synthetic documents (SDF), each with 500 training examples generated by Claude. Fine-tuning was performed using LoRA (r=16) on a single GPU. Evaluation involved perplexity on held-out text and human-readable trait tagging of 30 model responses. The first-person statements model achieved the lowest perplexity on its own format (4.5) and generalized well to synthetic documents (5.4), demonstrating a deeper encoding of the persona. While all methods achieved surface-level fidelity, the first-person approach most effectively conveyed C-3PO's anxiety and overall emotional texture, outperforming demonstrations and SDF in trait coverage.
Key takeaway
For AI Engineers aiming to imbue LLMs with robust, default personas, prioritize training data composed of first-person statements. This approach fosters deeper persona internalization and better generalization across varied interaction contexts than conversational demonstrations or factual descriptions. While demonstrations suit fixed deployment scenarios, and synthetic documents excel at factual accuracy, first-person data best captures the emotional and behavioral nuances of a character, reducing reliance on explicit system prompts.
Key insights
First-person statements are most effective for deeply embedding a persona in LLMs via SFT, enabling better generalization.
Principles
- First-person statements update self-representation.
- Demonstrations update behavioral patterns.
- Synthetic documents update world knowledge.
Method
Fine-tune a small LLM (Qwen3-4B-Instruct) with LoRA (r=16) using 500 examples across three data formats: demonstrations, first-person statements, and synthetic documents.
In practice
- Use first-person statements for robust persona generalization.
- Combine SDF with first-person for factual grounding and identity.
- System prompts remain effective for many use cases.
Topics
- Supervised Fine-Tuning
- Persona Injection
- Language Model Personalization
- LoRA Fine-tuning
- C-3PO Persona
Code references
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.