A startup claims it broke through a bottleneck that’s holding back LLMs

2026-06-19 · Source: MIT Technology Review · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Intermediate, medium

Summary

AI startup Subquadratic has emerged from stealth, claiming to have resolved a decade-long mathematical bottleneck in large language models by developing SubQ, a new LLM architecture. SubQ reportedly uses sparse attention with dynamic selection, making it faster, cheaper, and more energy-efficient than current models. The company asserts SubQ can process up to 12 times more text, boasting a 12 million token context window, and matches top models from Google DeepMind, OpenAI, and Anthropic on tasks like coding. Independent evaluation by Appen supports these claims, showing SubQ is 56 times faster than FlashAttention and scored 89.7% on LiveCodeBench. It also achieved 98% on needle-in-a-haystack tests with 6 million and 12 million token contexts. Despite initial skepticism due to limited availability and reused Qwen weights, Subquadratic aims to redefine LLM construction.

Key takeaway

For AI Scientists and Machine Learning Engineers evaluating LLM architectures for long-context or cost-sensitive applications, Subquadratic's SubQ presents a potential paradigm shift. Its dynamic sparse attention mechanism offers significantly faster and cheaper processing for large datasets, potentially moving beyond transformer-based models. You should closely monitor its wider availability and further independent validation, particularly for use cases demanding extensive context windows or substantial operational cost reductions.

Key insights

Subquadratic's SubQ LLM uses dynamic sparse attention to overcome the quadratic scaling bottleneck of dense attention, offering significant efficiency gains.

Principles

Dense attention leads to quadratic computational growth with text length.
Sparse attention can drastically reduce computations by selective token multiplication.
Dynamic selection of token relationships is key for effective sparse attention.

Method

SubQ replaces transformer's dense attention with a dynamically selected sparse attention mechanism. This method chooses relevant token relationships on the fly, avoiding the quadratic computational increase of traditional LLMs.

In practice

Process hundreds of documents or entire code bases efficiently.
Achieve frontier-level performance in competitive coding problems.
Retrieve specific information from 12 million token contexts.

Topics

Large Language Models
Sparse Attention
Transformer Architecture
Computational Efficiency
Context Window
AI Benchmarking
SubQ

Best for: AI Engineer, NLP Engineer, CTO, AI Scientist, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by MIT Technology Review.