TMP: Tree-structured Mixed-policy Pruning for Large-scale Image Generation and Editing

2026-06-25 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

A novel Tree-structured Mixed-policy Pruning (TMP) framework has been introduced to address the growing parameter and computation demands of modern image generation models. TMP generalizes across prevalent image tasks like Text-to-Image (T2I) and Image-to-Image (TI2I), and architectures including Mixture-of-Experts (MoE) and Diffusion Transformers (DiT). Experiments demonstrate TMP's efficacy by compressing HunyuanImage-3.0 from 80 billion to 20 billion parameters, a 75% reduction, with limited quality sacrifice. This pruned 20B version can infer on a single 24GB 4090 GPU. Additionally, TMP compressed Z-Image turbo from 6 billion to 4 billion parameters (33% reduction) with negligible degradation.

Key takeaway

For AI Engineers deploying large image generation models, TMP offers a viable path to significantly reduce parameter count and GPU memory footprint. You can compress models like HunyuanImage 3.0 by 75% (80B to 20B) and enable inference on a single 24GB 4090 GPU, making high-fidelity models more accessible and cost-effective for production environments. Consider integrating TMP to optimize your existing step-distilled models.

Key insights

TMP is a tree-structured mixed-policy pruning framework for compressing large image generation models across various architectures and tasks.

Principles

Pruning significantly reduces model parameters.
Mixed-policy pruning generalizes across architectures.
Step-distilled models can be further optimized.

Method

TMP applies a tree-structured mixed-policy pruning framework to large image generation models, generalizing across T2I/TI2I tasks and MoE/DiT architectures, including step-distilled models.

In practice

Compress HunyuanImage 3.0 to 20B parameters.
Enable 20B model inference on a 24GB 4090 GPU.
Reduce Z-Image turbo from 6B to 4B.

Topics

Image Generation
Model Pruning
Diffusion Transformers
Mixture-of-Experts
HunyuanImage-3.0
GPU Optimization

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.