LoopQ: Quantization for Recursive Transformers

2026-05-19 · Source: cs.LG updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

LoopQ is a novel post-training quantization (PTQ) framework designed for Looped Language Models (LoopLMs), which recursively reuse Transformer blocks for parameter efficiency. LoopLMs are highly susceptible to quantization errors due to distribution shifts across computational roles, state reuse across loop transitions, and recursive error accumulation. LoopQ addresses these challenges by maintaining a shared quantized backbone while introducing lightweight, loop-dependent adaptations. These adaptations include Loop-aware Activation Scaling (LAS) for magnitude drift, Selective Loop-aware Transformation (SLT) for geometry mismatch, and a Cross-loop Transition Adapter (CTA) for state alignment. Evaluated across seven benchmarks and four LoopLM architectures (Ouro, LoopFormer, Parcae), LoopQ significantly outperforms static PTQ baselines, achieving a 68.8% improvement in average downstream accuracy and an 87.7% reduction in average perplexity under W4A4 quantization.

Key takeaway

For AI Engineers deploying LoopLMs on resource-constrained devices, LoopQ offers a critical solution to overcome the severe performance degradation typically seen with post-training quantization. Your teams should consider integrating LoopQ to achieve significant accuracy and perplexity improvements, especially under aggressive W4A4 quantization, ensuring the parameter efficiency benefits of LoopLMs extend to low-bit inference without compromising model quality. This approach is particularly beneficial for long-context prediction tasks where error accumulation is most pronounced.

Key insights

LoopQ enables robust low-bit quantization for recursive LoopLMs by adapting to dynamic activation distributions and error propagation.

Principles

Recursive computation amplifies quantization errors.
Static quantization fails with dynamic activation distributions.
Lightweight, targeted adaptations preserve efficiency.

Method

LoopQ combines loop-aware activation scaling, selective transformation based on sharing-gap analysis, cross-loop state alignment, and trajectory-aware calibration to mitigate recursive quantization errors.

In practice

Use loop-dependent activation scales for magnitude drift.
Apply selective transformations for geometry mismatch.
Implement cross-loop adapters to stabilize transitions.

Topics

LoopQ Framework
Recursive Transformers
Post-Training Quantization
Quantization Error Accumulation
Loop-aware Activation Scaling

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Hardware Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.