UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs
Summary
UniEditBench is a new unified benchmark designed for evaluating both image and video editing models, addressing fragmentation across existing methods and modalities. It supports reconstruction-based and instruction-driven approaches under a shared protocol, encompassing a taxonomy of nine image operations like Add, Remove, and Replace, and eight video operations, including complex compositional tasks such as counting and spatial reordering. To overcome the high computational and financial costs of using large multimodal models (MLLMs) for evaluation, UniEditBench distills a Qwen3-VL-235B-A22B Instruct judge into lightweight 4B/8B evaluators. These distilled evaluators provide multi-dimensional scoring for structural fidelity, text alignment, background consistency, naturalness, and temporal-spatial consistency, demonstrating strong agreement with human judgments while significantly reducing deployment costs. The benchmark and its reward models are publicly available.
Key takeaway
For research scientists developing or comparing visual editing models, UniEditBench offers a standardized and cost-efficient evaluation framework. You should consider integrating this benchmark to ensure fair comparisons across diverse editing paradigms and modalities, leveraging its distilled MLLM evaluators to obtain reliable, multi-dimensional scores without incurring prohibitive computational expenses. This approach will streamline your model development and validation processes.
Key insights
UniEditBench unifies image and video editing evaluation using cost-effective, distilled MLLM judges.
Principles
- Unified benchmarks improve cross-paradigm comparison.
- Distillation reduces MLLM evaluation costs.
- Multi-dimensional scoring enhances evaluation reliability.
Method
A high-capacity MLLM judge (Qwen3-VL-235B-A22B Instruct) is distilled into lightweight 4B/8B evaluators to provide multi-dimensional scoring for visual editing models, covering fidelity, alignment, consistency, and naturalness.
In practice
- Use UniEditBench for image/video editing model evaluation.
- Deploy 4B/8B distilled evaluators to save costs.
- Assess models across nine image and eight video operations.
Topics
- UniEditBench
- Image Editing
- Video Editing
- MLLM Evaluation
- Model Distillation
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.