Recovering Diversity Without Losing Alignment: A DPO Recipe for Post-Trained LLMs

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

REDIPO is an offline DPO data-construction pipeline designed to recover diverse valid answers in post-trained Large Language Models (LLMs) while preserving their alignment benefits. The pipeline samples responses from both base and instruct models, rewrites base-model responses using the instruct model, filters candidates for safety and instruction-following quality, and then builds preference pairs that favor marginally diverse responses among candidates with similar instruction-following reward. Across Qwen3-4B, OLMo-3-7B, and LLaMA-3.1-8B, REDIPO improved NoveltyBench distinct_k by 134%, 33%, and 44% respectively, relative to instruct checkpoints. These gains largely maintained MTBench, IFEval, and Arena-Hard performance, and reduced direct-category HarmBench attack success rates.

Key takeaway

For Machine Learning Engineers or AI Scientists aiming to enhance the output diversity of fine-tuned LLMs without compromising alignment, REDIPO offers a validated approach. This method demonstrates that reintroducing diverse valid answers from base-model generations is achievable through carefully constructed preference data. You should consider exploring the released code and data at https://github.com/vsamuel2003/RiDiPO to implement this DPO recipe in your post-training workflows, especially for open-ended instruction tasks.

Key insights

Post-training LLMs can regain output diversity without losing alignment by carefully constructing DPO preference data.

Principles

Method

REDIPO samples from base and instruct models, rewrites base responses with the instruct model, filters for safety and instruction-following, then builds preference pairs favoring marginally diverse responses among candidates with similar instruction-following reward.

In practice

Topics

Code references

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.