VMAF v1: Good Is Not Good Enough

· Source: Netflix TechBlog - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, long

Summary

Netflix has open-sourced VMAF v1, an updated version of its Video Multimethod Assessment Fusion metric, which serves as a de facto standard for video encoding evaluation. VMAF v1 addresses several limitations of its predecessor, VMAF v0, to more accurately assess visual quality and deliver higher quality for Netflix members. Key improvements include enhanced sensitivity to compression artifacts by adding the AIM component to complement DLM, and a unified model that generalizes across various viewing conditions like 1080p@3H, phone (5H), and 4K (1.5H and 3H) by modulating the spatial contrast sensitivity function. The new version also integrates CAMBI for banding artifact detection, incorporates SpEED-QA for chroma artifacts, and enables the no-enhancement gain (NEG) mode by default. Furthermore, VMAF v1 improves motion feature handling for high-motion and high-frame-rate sequences. These enhancements, coupled with the removal of computationally complex VIF and other optimizations, result in a more accurate and faster metric.

Key takeaway

For video engineers and content providers optimizing encoding pipelines, VMAF v1 offers a more accurate and efficient quality assessment tool. You should integrate VMAF v1 to better evaluate compression, scaling, banding, and chroma artifacts across diverse viewing conditions, from 1080p to 4K and mobile. This update helps ensure optimal visual quality for your audience while potentially reducing computational overhead compared to VMAF v0. Consider using the specific 1.5H or 3H 4K models for discerning or consumer-like viewing.

Key insights

VMAF v1 improves video quality assessment by integrating advanced perceptual features and a unified model for diverse viewing conditions.

Principles

Method

VMAF v1 fuses elementary quality features (DLM, AIM, CAMBI, SpEED-QA) with an SVR, adjusting feature values based on normalized viewing distance via CSF modulation for generalized accuracy.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Netflix TechBlog - Medium.