Liquid AI Releases LFM2.5-8B-A1B: An On-Device MoE Model With 8.3B Total and 1.5B Active Parameters
Summary
Liquid AI released LFM2.5-8B-A1B, an on-device Mixture-of-Experts (MoE) model, which activates only 1.5B of its 8.3B total parameters per token. This version introduces reasoning-only capabilities, producing an explicit chain of thought, and demonstrates a significant reduction in hallucination rates, with the Non-Hallucination Rate improving from 7.46 to 63.47, IFEval from 79.44 to 91.84, MATH500 from 74.80 to 88.76, and Tau² Telecom from 13.60 to 88.07. The model runs efficiently on various hardware, achieving 253 tok/s on an M5 Max under 6 GB, ~30 tok/s on a phone, and 18.5K tok/s on a single H100. It emphasizes tool calling, exemplified by the LocalCowork demo running 67 tools across 13 MCP servers on one laptop without cloud dependencies. The model offers open weights and day-one support for llama.cpp, MLX, vLLM, and SGLang.
Key takeaway
For AI engineers building local agents or on-device AI applications, you should consider LFM2.5-8B-A1B. Its efficient MoE architecture, combined with significantly reduced hallucination and robust tool-calling capabilities, offers a compelling solution for privacy-preserving, high-performance local AI. Evaluate its integration to develop agents that operate entirely on-device, minimizing cloud dependencies and data egress.
Key insights
LFM2.5-8B-A1B is an on-device MoE model that significantly reduces hallucinations and enables complex tool-calling.
Principles
- MoE architecture enables cheap reasoning tokens.
- Targeted RL reward can train models to abstain on unknown questions.
Method
A targeted avg@k RL reward trains the model to abstain on questions beyond its knowledge, significantly reducing hallucination rates.
In practice
- Run 67 tools across 13 MCP servers on one laptop.
- Utilize llama.cpp, MLX, vLLM, SGLang for deployment.
Topics
- Mixture-of-Experts
- On-device AI
- Hallucination Reduction
- Tool Calling
- Local Agents
- Model Inference
Best for: AI Architect, NLP Engineer, Research Scientist, AI Engineer, Machine Learning Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning ML & Generative AI News.