UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs

2026-04-17 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition · Depth: Expert, quick

Summary

UniEditBench is a new unified benchmark designed for evaluating both image and video editing models, addressing fragmentation across existing methods and modalities. It supports reconstruction-based and instruction-driven approaches under a shared protocol, encompassing a taxonomy of nine image operations like Add, Remove, and Replace, and eight video operations, including complex compositional tasks such as counting and spatial reordering. To overcome the high computational and financial costs of using large multimodal models (MLLMs) for evaluation, UniEditBench distills a Qwen3-VL-235B-A22B Instruct judge into lightweight 4B/8B evaluators. These distilled evaluators provide multi-dimensional scoring for structural fidelity, text alignment, background consistency, naturalness, and temporal-spatial consistency, demonstrating strong agreement with human judgments while significantly reducing deployment costs. The benchmark and its reward models are publicly available.

Key takeaway

For research scientists developing or comparing visual editing models, UniEditBench offers a standardized and cost-efficient evaluation framework. You should consider integrating this benchmark to ensure fair comparisons across diverse editing paradigms and modalities, leveraging its distilled MLLM evaluators to obtain reliable, multi-dimensional scores without incurring prohibitive computational expenses. This approach will streamline your model development and validation processes.

Key insights

UniEditBench unifies image and video editing evaluation using cost-effective, distilled MLLM judges.

Principles

Unified benchmarks improve cross-paradigm comparison.
Distillation reduces MLLM evaluation costs.
Multi-dimensional scoring enhances evaluation reliability.

Method

A high-capacity MLLM judge (Qwen3-VL-235B-A22B Instruct) is distilled into lightweight 4B/8B evaluators to provide multi-dimensional scoring for visual editing models, covering fidelity, alignment, consistency, and naturalness.

In practice

Use UniEditBench for image/video editing model evaluation.
Deploy 4B/8B distilled evaluators to save costs.
Assess models across nine image and eight video operations.

Topics

UniEditBench
Image Editing
Video Editing
MLLM Evaluation
Model Distillation

Code references

wesar1/UniEditBench

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.