TECCI: Tricky Edits of Collected and Curated Images

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition · Depth: Expert, quick

Summary

A new image editing benchmark, TECCI (Tricky Edits of Collected and Curated Images), has been introduced to systematically test generative image editors. TECCI comprises a novel set of images across 7 categories, intentionally curated to expose weaknesses in existing methods. Edit instructions are automatically generated by Gemini, covering 5 types per image, supplemented by 530 manually written challenging instructions, totaling 7550 image-instruction pairs. Human evaluations of five leading models on TECCI assessed instruction following, minimality, and visual quality. An auto-rater, also using Gemini, achieved 74.7% accuracy matching human judgments. Results indicate that no model exceeds a 22% overall success rate, with Nano Banana Pro performing best. Models excel at instruction following but struggle with minimal edits and visual quality, particularly for architecture and nature images. Reasoning and creative edits proved most difficult, while color and appearance edits were easiest.

Key takeaway

For AI Scientists and Machine Learning Engineers developing generative image editing models, you should prioritize improving performance on challenging edits identified by the TECCI benchmark. Your current models likely struggle with minimal edits, visual quality, and complex reasoning or creative instructions, especially for architecture and nature images. Focus your research on these areas to significantly advance model capabilities beyond the current sub-22% success rates.

Key insights

Generative image editors struggle with complex edits, requiring new benchmarks like TECCI for systematic evaluation.

Principles

Method

TECCI's method involves curating images, generating instructions via Gemini (or manually), and evaluating models on instruction following, minimality, and visual quality, scaled by an auto-rater.

In practice

Topics

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.