EfficientUICoder: A Bidirectional Token Compression Framework for Efficient MLLM-Based UI Code Generation

2025-07-05 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, long

Summary

EfficientUICoder is a novel bidirectional token compression framework designed to enhance the efficiency of Multimodal Large Language Models (MLLMs) in UI-to-Code (UI2Code) tasks. It addresses the substantial computational overhead caused by redundant input image tokens and extensive output code tokens. The framework comprises three key components: Element and Layout-aware Token Compression, which constructs UI element trees; Region-aware Token Refinement, which uses attention scores to optimize token selection; and Adaptive Duplicate Token Suppression, which penalizes repetitive HTML/CSS generation. Experiments show EfficientUICoder achieves a 55%-60% compression ratio without quality compromise, reducing computational cost by 44.9%, generated tokens by 41.4%, prefill time by 46.6%, and inference time by 48.8% on 34B-level MLLMs.

Key takeaway

For AI Engineers developing MLLM-based UI-to-Code solutions, you should consider integrating token compression techniques to significantly boost efficiency. EfficientUICoder's approach demonstrates that reducing redundant input image and output code tokens can cut computational cost by 44.9% and inference time by 48.8% on 34B-level MLLMs without compromising output quality. Evaluate your current MLLM pipeline for similar token redundancies to achieve substantial performance gains.

Key insights

MLLM UI2Code efficiency is significantly improved by bidirectionally compressing redundant input image and output code tokens.

Principles

Redundant visual tokens inflate computational cost and distract attention.
MLLMs often produce superfluous HTML/CSS structures and text.
Computational complexity strongly depends on sequence length.

Method

EfficientUICoder uses UI element detection and tree construction for visual compression, refines tokens via attention scores, and applies exponential penalties to suppress duplicate HTML/CSS during decoding.

In practice

Achieves 55%-60% token compression ratio.
Reduces MLLM inference time by 48.8%.
Decreases computational cost by 44.9%.

Topics

Multimodal Large Language Models
UI-to-Code
Token Compression
Code Generation
Web Development
Computational Efficiency

Code references

WebPAI/EfficientUICoder

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.