ParaRNN: Large-Scale Nonlinear RNNs, Trainable in Parallel

· Source: Apple Machine Learning Research · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, long

Summary

Apple researchers have developed ParaRNN, an open-source framework that enables parallel training of nonlinear Recurrent Neural Networks (RNNs) for large language models. Published in April 2026 and accepted to ICLR 2026, ParaRNN achieves a 665x speedup over traditional sequential RNN training, making it feasible to train 7-billion-parameter classical RNNs. These large-scale ParaGRU and ParaLSTM models demonstrate language modeling performance competitive with transformers and Mamba2, while maintaining the constant-time token generation and memory efficiency of RNNs during inference. The framework leverages an adaptation of Newton's method to linearize nonlinear recurrences, allowing parallel computation, and includes structured Jacobians and custom CUDA kernels for optimized performance, addressing the historical training bottleneck of RNNs.

Key takeaway

For NLP engineers and research scientists designing LLMs, ParaRNN offers a compelling alternative to attention-based architectures. Your teams can now train billion-parameter nonlinear RNNs with competitive performance and significantly faster inference throughput, especially for resource-constrained deployments. Consider integrating the open-source ParaRNN framework to explore novel recurrent architectures and potentially reduce operational costs for high-volume inference.

Key insights

ParaRNN enables parallel training of large-scale nonlinear RNNs, matching transformer performance with superior inference efficiency.

Principles

Method

ParaRNN reframes RNN sequences as a system of equations, iteratively solving them via Newton's method by linearizing nonlinearities with Jacobians, which are then solved in parallel using techniques similar to linear SSMs.

In practice

Topics

Code references

Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Apple Machine Learning Research.