😺 SubQ ships 12M tokens at 1/5 the cost
Summary
Subquadratic, a new lab backed by $25 million in seed funding, has launched SubQ, a large language model (LLM) built on a sub-quadratic architecture. This architecture, called Subquadratic Selective Attention (SSA), scales linearly with input length and runs 52 times faster than FlashAttention at 1 million tokens. SubQ boasts a native 12-million-token context window, operating at approximately one-fifth the cost of current frontier models. It achieved a 97% score on RULER 128K for long-context accuracy, surpassing Opus 4.6's 94%, and scored 83 on MRCR v2 for multi-needle retrieval, outperforming Opus (78), GPT-5.4 (39), and Gemini 3.1 Pro (23). SubQ offers a 12M-token API and SubQ Code, a CLI agent for repository loading, with plans to reach 100M tokens by Q4.
Key takeaway
For CTOs and VP of Engineering evaluating LLM infrastructure, Subquadratic's SubQ model presents a compelling alternative to traditional Transformer architectures. Its sub-quadratic scaling and native 12M-token context window significantly reduce operational costs and complexity associated with memory hacks like RAG. Consider piloting SubQ's API or CLI agent to streamline workflows that demand extensive context, potentially simplifying your AI stack and improving cost-efficiency for long-document processing and code analysis.
Key insights
Subquadratic's new LLM, SubQ, offers a 12M-token context window at 1/5 the cost, challenging traditional Transformer limitations.
Principles
- Linear scaling improves LLM cost-efficiency.
- Native long-context architectures reduce engineering overhead.
Method
SubQ utilizes a Subquadratic Selective Attention (SSA) architecture that scales linearly with input length, enabling a 12M-token context window at significantly lower cost and higher speed than O(n²) Transformer models.
In practice
- Utilize SubQ's 12M-token API for cost-effective long-context applications.
- Employ SubQ Code CLI for single-pass repository loading in development.
Topics
- Subquadratic LLM
- Sub-quadratic Architecture
- Long Context Windows
- AI Lawsuits
- Prompt Engineering
Code references
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Engineer, General Interest
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Neuron.