EPIG: Emotion-Based Prompting for Personalised Image Generation
Summary
EPIG (Emotion-Based Prompting for Personalised Image Generation) is a novel, training-free method designed to enhance emotional expressiveness in text-to-image diffusion models by enriching prompts prior to image generation. This lightweight solution leverages psychologically informed valence-arousal emotion representations and a structured, role-aware prompt enrichment pipeline. EPIG decomposes input prompts into subject, stimulus, and context, then maps target emotional states to relevant descriptors from the NRC VAD lexicon using Euclidean proximity. These descriptors are then injected into specific syntactic roles, guiding the generative process towards emotionally coherent visual outputs without modifying or retraining the underlying diffusion model like SDXL-Turbo. Experimental results on a 10-prompt benchmark demonstrate EPIG significantly reduces mean arousal error by 14% against naive insertion and 12% against LLM-based expansion, with a 17% reduction for subject-centric prompts. It also preserves valence alignment and semantic content, making it suitable for resource-constrained and personalized scenarios.
Key takeaway
For prompt engineers or ML engineers aiming for precise emotional control in text-to-image generation, you should integrate EPIG's training-free, role-aware prompt enrichment. This method significantly improves arousal alignment and preserves semantic content, particularly for subject-centric prompts. It allows you to generate emotionally consistent visuals for applications like therapeutic visualization or personalized content, avoiding the computational overhead and affective drift of other methods. Be aware it is most effective with clear emotional subjects.
Key insights
Emotionally coherent image generation is achievable via prompt-level, role-aware descriptor injection using VAD lexicons.
Principles
- Affective roles (subject, stimulus, context) prevent semantic bleeding.
- Psychologically grounded VAD dimensions enable precise emotional control.
- Training-free prompt enrichment maintains efficiency and reproducibility.
Method
EPIG decomposes prompts, maps target emotions to NRC VAD lexicon descriptors via Euclidean distance, and inserts subject- or context-centric terms into specific syntactic roles.
In practice
- Generate images with specific emotional tones for therapeutic visualization.
- Create personalized content with fine-grained affective control.
- Enhance emotional consistency in psychological research imagery.
Topics
- EPIG
- Emotion-aware Image Generation
- Prompt Engineering
- Text-to-Image Diffusion Models
- Valence-Arousal-Dominance
- NRC VAD Lexicon
Code references
Best for: AI Engineer, Computer Vision Engineer, Research Scientist, AI Scientist, Prompt Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.