TMP: Tree-structured Mixed-policy Pruning for Large-scale Image Generation and Editing

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

A novel Tree-structured Mixed-policy Pruning (TMP) framework has been introduced to address the growing parameter and computation demands of modern image generation models. TMP generalizes across prevalent image tasks like Text-to-Image (T2I) and Image-to-Image (TI2I), and architectures including Mixture-of-Experts (MoE) and Diffusion Transformers (DiT). Experiments demonstrate TMP's efficacy by compressing HunyuanImage-3.0 from 80 billion to 20 billion parameters, a 75% reduction, with limited quality sacrifice. This pruned 20B version can infer on a single 24GB 4090 GPU. Additionally, TMP compressed Z-Image turbo from 6 billion to 4 billion parameters (33% reduction) with negligible degradation.

Key takeaway

For AI Engineers deploying large image generation models, TMP offers a viable path to significantly reduce parameter count and GPU memory footprint. You can compress models like HunyuanImage 3.0 by 75% (80B to 20B) and enable inference on a single 24GB 4090 GPU, making high-fidelity models more accessible and cost-effective for production environments. Consider integrating TMP to optimize your existing step-distilled models.

Key insights

TMP is a tree-structured mixed-policy pruning framework for compressing large image generation models across various architectures and tasks.

Principles

Method

TMP applies a tree-structured mixed-policy pruning framework to large image generation models, generalizing across T2I/TI2I tasks and MoE/DiT architectures, including step-distilled models.

In practice

Topics

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.