Creo: From One-Shot Image Generation to Progressive, Co-Creative Ideation

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation, Human-Computer Interaction · Depth: Expert, extended

Summary

Creo is a multi-stage text-to-image (T2I) system designed to align image generation with human creative processes, which typically involve progressive refinement rather than one-shot, fully rendered outputs. Traditional T2I systems often make implicit visual decisions early, anchoring users prematurely and limiting control. Creo addresses this by scaffolding image generation from rough sketches to high-resolution outputs across five independent stages: viewpoint, composition, color, lighting, and style. Users can make incremental changes at each stage using direct manipulation and AI-assisted tools, with a locking mechanism preserving prior decisions to prevent unintended drift. A comparative study against a one-shot baseline (ChatGPT) showed that Creo users reported stronger ownership, greater control, and produced less homogeneous outputs, indicating improved user agency and creativity. The system supports non-linear workflows, allowing users to revisit stages and propagate changes while maintaining consistency.

Key takeaway

For AI Product Managers designing generative tools, consider adopting a multi-stage, progressive commitment framework like Creo. This approach, which allows users to refine images from sketches to high-fidelity outputs with explicit control over distinct visual dimensions, significantly boosts user ownership and creative exploration. Prioritize interfaces that expose editable intermediate representations and stable decision preservation over single-shot, fully rendered outputs to foster deeper user engagement and more diverse creative outcomes.

Key insights

Multi-stage T2I generation with intermediate control enhances user agency, creativity, and output diversity.

Principles

Method

Creo decomposes T2I into viewpoint, composition, color, lighting, and style stages. It uses sketch-based intermediate representations, combines direct manipulation with AI-assisted tools, and employs a locking mechanism for stable, diff-based updates.

In practice

Topics

Best for: Computer Vision Engineer, Research Scientist, AI Product Manager, AI Scientist, Product Designer, Creative Technologist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.