😺 SubQ ships 12M tokens at 1/5 the cost

· Source: The Neuron · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, extended

Summary

Subquadratic, a new lab backed by $25 million in seed funding, has launched SubQ, a large language model (LLM) built on a sub-quadratic architecture. This architecture, called Subquadratic Selective Attention (SSA), scales linearly with input length and runs 52 times faster than FlashAttention at 1 million tokens. SubQ boasts a native 12-million-token context window, operating at approximately one-fifth the cost of current frontier models. It achieved a 97% score on RULER 128K for long-context accuracy, surpassing Opus 4.6's 94%, and scored 83 on MRCR v2 for multi-needle retrieval, outperforming Opus (78), GPT-5.4 (39), and Gemini 3.1 Pro (23). SubQ offers a 12M-token API and SubQ Code, a CLI agent for repository loading, with plans to reach 100M tokens by Q4.

Key takeaway

For CTOs and VP of Engineering evaluating LLM infrastructure, Subquadratic's SubQ model presents a compelling alternative to traditional Transformer architectures. Its sub-quadratic scaling and native 12M-token context window significantly reduce operational costs and complexity associated with memory hacks like RAG. Consider piloting SubQ's API or CLI agent to streamline workflows that demand extensive context, potentially simplifying your AI stack and improving cost-efficiency for long-document processing and code analysis.

Key insights

Subquadratic's new LLM, SubQ, offers a 12M-token context window at 1/5 the cost, challenging traditional Transformer limitations.

Principles

Method

SubQ utilizes a Subquadratic Selective Attention (SSA) architecture that scales linearly with input length, enabling a 12M-token context window at significantly lower cost and higher speed than O(n²) Transformer models.

In practice

Topics

Code references

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Engineer, General Interest

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Neuron.