😺 SubQ ships 12M tokens at 1/5 the cost

2026-04-30 · Source: The Neuron · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, extended

Summary

Subquadratic, a new lab backed by $25 million in seed funding, has launched SubQ, a large language model (LLM) built on a sub-quadratic architecture. This architecture, called Subquadratic Selective Attention (SSA), scales linearly with input length and runs 52 times faster than FlashAttention at 1 million tokens. SubQ boasts a native 12-million-token context window, operating at approximately one-fifth the cost of current frontier models. It achieved a 97% score on RULER 128K for long-context accuracy, surpassing Opus 4.6's 94%, and scored 83 on MRCR v2 for multi-needle retrieval, outperforming Opus (78), GPT-5.4 (39), and Gemini 3.1 Pro (23). SubQ offers a 12M-token API and SubQ Code, a CLI agent for repository loading, with plans to reach 100M tokens by Q4.

Key takeaway

For CTOs and VP of Engineering evaluating LLM infrastructure, Subquadratic's SubQ model presents a compelling alternative to traditional Transformer architectures. Its sub-quadratic scaling and native 12M-token context window significantly reduce operational costs and complexity associated with memory hacks like RAG. Consider piloting SubQ's API or CLI agent to streamline workflows that demand extensive context, potentially simplifying your AI stack and improving cost-efficiency for long-document processing and code analysis.

Key insights

Subquadratic's new LLM, SubQ, offers a 12M-token context window at 1/5 the cost, challenging traditional Transformer limitations.

Principles

Linear scaling improves LLM cost-efficiency.
Native long-context architectures reduce engineering overhead.

Method

SubQ utilizes a Subquadratic Selective Attention (SSA) architecture that scales linearly with input length, enabling a 12M-token context window at significantly lower cost and higher speed than O(n²) Transformer models.

In practice

Utilize SubQ's 12M-token API for cost-effective long-context applications.
Employ SubQ Code CLI for single-pass repository loading in development.

Topics

Subquadratic LLM
Sub-quadratic Architecture
Long Context Windows
AI Lawsuits
Prompt Engineering

Code references

obsidianmd/obsidian-cli

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Engineer, General Interest

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Neuron.