TECCI: Tricky Edits of Collected and Curated Images
Summary
A new image editing benchmark, TECCI (Tricky Edits of Collected and Curated Images), has been introduced to systematically test generative image editors. TECCI comprises a novel set of images across 7 categories, intentionally curated to expose weaknesses in existing methods. Edit instructions are automatically generated by Gemini, covering 5 types per image, supplemented by 530 manually written challenging instructions, totaling 7550 image-instruction pairs. Human evaluations of five leading models on TECCI assessed instruction following, minimality, and visual quality. An auto-rater, also using Gemini, achieved 74.7% accuracy matching human judgments. Results indicate that no model exceeds a 22% overall success rate, with Nano Banana Pro performing best. Models excel at instruction following but struggle with minimal edits and visual quality, particularly for architecture and nature images. Reasoning and creative edits proved most difficult, while color and appearance edits were easiest.
Key takeaway
For AI Scientists and Machine Learning Engineers developing generative image editing models, you should prioritize improving performance on challenging edits identified by the TECCI benchmark. Your current models likely struggle with minimal edits, visual quality, and complex reasoning or creative instructions, especially for architecture and nature images. Focus your research on these areas to significantly advance model capabilities beyond the current sub-22% success rates.
Key insights
Generative image editors struggle with complex edits, requiring new benchmarks like TECCI for systematic evaluation.
Principles
- Current models fail complex edits.
- Minimality and visual quality are weak points.
- Reasoning edits are hardest.
Method
TECCI's method involves curating images, generating instructions via Gemini (or manually), and evaluating models on instruction following, minimality, and visual quality, scaled by an auto-rater.
In practice
- Use TECCI to benchmark new models.
- Focus development on minimal edits.
- Prioritize reasoning and creative edit capabilities.
Topics
- Image Editing
- Generative AI
- Benchmarking
- Model Evaluation
- Gemini
- Computer Vision
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.