Retell, Reward, Repeat: Reinforcement Learning for Narrative Theory-Informed Story Retelling

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

Researchers at UNSW explored reinforcement learning (d-RLAIF) as a post-training method for Automatic Story Generation (ASG), contrasting it with supervised fine-tuning (SFT). They applied Todorov's Theory of Narrative Equilibrium to define desirable story qualities, using these principles to prompt 7B and 14B LLM-as-judge models (Selene-1-mini-8B, M-Prometheus-14B) for reward signals. Three open-weight LLMs (Llama-3.1-8B, Olmo-3-7B, Qwen-3-8B) were then post-trained using d-RLAIF with GRPO and LoRA on the TimeTravel dataset. Evaluation with Gemini-3-Flash showed d-RLAIF produced more diverse stories aligned with human narrative conventions, outperforming SFT in overall quality (minLRC) when using a narrativity-based reward signal (RN). SFT, however, yielded higher linguistic similarity and structural completeness to original stories. The study highlights d-RLAIF's promise for linguistically grounded ASG.

Key takeaway

For AI Scientists and Machine Learning Engineers developing Automatic Story Generation systems, consider integrating direct reinforcement learning from AI feedback (d-RLAIF) with narrative theory-informed reward models. This approach, particularly using narrativity-based signals, can yield more diverse and human-aligned stories than traditional supervised fine-tuning. Focus on carefully designing your LLM-as-judge prompts and reward structures, as their characteristics significantly influence model convergence and output quality, even with smaller 8B models.

Key insights

Reinforcement learning with narrative theory-informed AI feedback improves story diversity and human narrative alignment.

Principles

Method

Post-train LLMs using d-RLAIF, where an LLM-as-judge, prompted with narrative theory principles, generates reward signals for GRPO optimization.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.