Parcae: Doing more with fewer parameters using stable looped models

2026-04-29 · Source: Together AI | The AI Native Cloud - Together.ai · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, medium

Summary

Parcae, a novel architecture for looped language models published on April 15, 2026, achieves the performance of a Transformer twice its size while offering stable and predictable training. This model addresses the inherent instability of previous looped architectures, which suffered from residual state explosion and loss spikes, by explicitly maintaining stability conditions through a constrained parameterization of input injection. Parcae demonstrates up to 6.3% lower validation perplexity compared to prior large-scale looped recipes, with a 770M Parcae model matching the quality of a 1.3B parameter Transformer. The research also establishes the first scaling laws for looping, indicating that compute-optimal training requires increasing both looping and data. This approach opens an efficient scaling frontier for memory-constrained on-device models by emphasizing recurrence over pure data scaling.

Key takeaway

For Machine Learning Engineers developing memory-constrained on-device language models, Parcae offers a stable and parameter-efficient architecture. You should consider integrating Parcae to achieve higher model quality with fewer parameters, potentially halving the parameter count compared to traditional Transformers for equivalent performance. Explore its scaling laws to optimize training by balancing recurrence and data, and utilize the released code and models to accelerate your development.

Key insights

Parcae stabilizes looped language models, enabling efficient quality scaling for memory-constrained devices by increasing recurrence.

Principles

Looped models can scale quality without inflating memory footprint.
Training stability in looped models depends on ρ(A―)<1.
Optimal looped model training scales recurrence and data together.

Method

Parcae stabilizes looped models by parameterizing input injection A,B with A:=Diag(−exp⁡(logA)) to ensure ρ(A―)<1, alongside other training tricks.

In practice

Use Parcae for efficient on-device language models.
Explore recurrence scaling for better FLOP efficiency.
Access Parcae training code and models on Hugging Face.

Topics

Parcae
Looped Language Models
Parameter Efficiency
Model Stability
On-device AI
Scaling Laws

Code references

sandyresearch/parcae

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Together AI | The AI Native Cloud - Together.ai.