TuneJury: An Open Metric for Improving Music Generation Preference Alignment
Summary
TuneJury is an open, instance-level pairwise reward model designed for text-to-music generation, predicting music preference scores from text prompts and audio clips. It is trained on publicly available human-preference labels, including arena-style (A vs. B) votes, metric-alignment pairs, crowdsourced comparisons, and expert aesthetic ratings. The model's predicted score margin is well calibrated on held-out test data, enabling data filtering via a simple score threshold. TuneJury demonstrates strong generalization to both held-out test pairs and out-of-distribution benchmarks, performing competitively against prior baselines. For generators released after its training, TuneJury introduces anchor calibration, a post-hoc, per-system Bradley-Terry calibration method that efficiently recovers agreement. This frozen reward model consistently drives gains across downstream applications like inference-time best-of-N selection, DITTO-style latent optimization, and expert-iteration post-training. The model is available at https://github.com/yonghyunk1m/TuneJury.
Key takeaway
For Machine Learning Engineers developing text-to-music models, TuneJury provides a robust, open-source metric to enhance preference alignment. You should integrate this reward model for evaluating and refining your generative outputs, leveraging its calibrated scores for data filtering or applying anchor calibration for new models. This can significantly improve the perceived quality of your music generation with better data efficiency in post-training and optimization workflows.
Key insights
TuneJury offers an open, calibrated reward model for aligning text-to-music generation with human preferences.
Principles
- Predicted score margins are well calibrated
- Generalizes to out-of-distribution benchmarks
- Anchor calibration improves data efficiency
Method
Anchor calibration applies a post-hoc, per-system Bradley-Terry calibration to recover agreement for new music generators released after training.
In practice
- Apply for inference-time best-of-N selection
- Integrate into DITTO-style latent optimization
- Utilize for expert-iteration post-training
Topics
- TuneJury
- Music Generation
- Reward Models
- Text-to-Music
- Preference Alignment
- Anchor Calibration
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.