Liquid AI's smallest model yet LFM2.5-230M beats models 4X its size at data extraction, can run 'anywhere'

· Source: VentureBeat · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Data Science & Analytics · Depth: Intermediate, short

Summary

Liquid AI, founded by former MIT computer scientists, released its LFM2.5-230M language model on June 25, 2026. This 230-million-parameter foundation model is explicitly designed for on-device agentic workflows and local deployment on smartphones, laptops, and robotics. It reportedly outperforms models over 4X its size, such as the 800-million-parameter Alibaba Qwen3.5-0.8B and 1-billion-parameter Google Gemma 3 1B, on data extraction and tool-use benchmarks like BFCLv3 and CaseReportBench. The LFM2.5-230M utilizes a unique LFM2 hybrid architecture, combining gated short-range convolutions with grouped-query attention, enabling a 32K context window and maintaining a memory footprint under 400MB. It achieves decode speeds of 213 tokens per second on a Samsung Galaxy S25 Ultra and 42 tokens per second on a Raspberry Pi 5. The model is available under a dual-use commercial license, free for entities with under \$10 million in annual revenue, requiring a paid agreement for larger enterprises.

Key takeaway

For AI Engineers or Directors of AI/ML evaluating on-device AI solutions, Liquid AI's LFM2.5-230M presents a compelling option. If your team needs to automate data extraction or deploy agentic workflows on edge hardware, this 230-million-parameter model offers superior performance for its size, significantly reducing cloud compute costs and latency. You should assess its capabilities for local deployment on smartphones, robotics, or other constrained environments to streamline operations.

Key insights

Liquid AI's LFM2.5-230M demonstrates that highly efficient, small models can surpass larger ones for specific on-device data extraction and tool-use tasks.

Principles

Method

The LFM2.5-230M model employs a hybrid LFM2 architecture, interleaving gated short-range convolutions with grouped-query attention to process information efficiently with a 32K context window.

In practice

Topics

Best for: AI Architect, NLP Engineer, CTO, AI Engineer, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.