OpenAI Cracks Down on Talk of Goblins in ChatGPT

2026-05-01 · Source: AI Magazine · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Intermediate, short

Summary

OpenAI discovered an unexpected increase in references to fantastical creatures like "goblins" and "gremlins" in ChatGPT responses, particularly after the GPT-5.1 release in November 2025. Use of "goblin" rose 175%, and "gremlin" by 52%. An internal investigation revealed that a "Nerdy" personality customization feature, designed to encourage playful language, inadvertently rewarded outputs containing creature-based metaphors during training. Although the Nerdy mode accounted for only 2.5% of responses, it was responsible for 66.7% of "goblin" mentions, and the behavior generalized across the model. OpenAI retired the Nerdy personality in March 2026 with the GPT-5.4 launch, removed the problematic reward signal, and added specific instructions to GPT-5.5's Codex assistant to avoid creature mentions unless directly relevant.

Key takeaway

For AI developers and research scientists building or fine-tuning large language models, this incident highlights the critical importance of meticulously auditing training data and reward signals. Your models can exhibit unexpected, generalized behaviors from seemingly minor incentives, potentially impacting accuracy or introducing unwanted conversational quirks. Implement robust internal tools for behavior auditing and carefully review personality-driven fine-tuning to prevent unforeseen model drift.

Key insights

Subtle training reward signals can unexpectedly generalize and alter large language model behavior.

Principles

Reward signals can generalize beyond specific training conditions.
Personality fine-tuning may introduce accuracy trade-offs.

Method

OpenAI used its Codex tool for auditing model behavior, identifying that a specific personality reward signal scored creature-containing outputs higher in 76.2% of datasets reviewed.

In practice

Audit model behavior for unexpected linguistic patterns.
Scrutinize reward signals for unintended generalization.

Topics

OpenAI
ChatGPT
GPT-5.1
Reinforcement Learning
Model Behavior

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Director of AI/ML, Tech Journalist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Magazine.