EfficientUICoder: A Bidirectional Token Compression Framework for Efficient MLLM-Based UI Code Generation
Summary
EfficientUICoder is a novel bidirectional token compression framework designed to enhance the efficiency of Multimodal Large Language Models (MLLMs) in UI-to-Code (UI2Code) tasks. It addresses the substantial computational overhead caused by redundant input image tokens and extensive output code tokens. The framework comprises three key components: Element and Layout-aware Token Compression, which constructs UI element trees; Region-aware Token Refinement, which uses attention scores to optimize token selection; and Adaptive Duplicate Token Suppression, which penalizes repetitive HTML/CSS generation. Experiments show EfficientUICoder achieves a 55%-60% compression ratio without quality compromise, reducing computational cost by 44.9%, generated tokens by 41.4%, prefill time by 46.6%, and inference time by 48.8% on 34B-level MLLMs.
Key takeaway
For AI Engineers developing MLLM-based UI-to-Code solutions, you should consider integrating token compression techniques to significantly boost efficiency. EfficientUICoder's approach demonstrates that reducing redundant input image and output code tokens can cut computational cost by 44.9% and inference time by 48.8% on 34B-level MLLMs without compromising output quality. Evaluate your current MLLM pipeline for similar token redundancies to achieve substantial performance gains.
Key insights
MLLM UI2Code efficiency is significantly improved by bidirectionally compressing redundant input image and output code tokens.
Principles
- Redundant visual tokens inflate computational cost and distract attention.
- MLLMs often produce superfluous HTML/CSS structures and text.
- Computational complexity strongly depends on sequence length.
Method
EfficientUICoder uses UI element detection and tree construction for visual compression, refines tokens via attention scores, and applies exponential penalties to suppress duplicate HTML/CSS during decoding.
In practice
- Achieves 55%-60% token compression ratio.
- Reduces MLLM inference time by 48.8%.
- Decreases computational cost by 44.9%.
Topics
- Multimodal Large Language Models
- UI-to-Code
- Token Compression
- Code Generation
- Web Development
- Computational Efficiency
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.