Latest open artifacts (#18): Arcee's 400B MoE, LiquidAI's underrated 1B model, new Kimi, and anticipation of a busy month
Summary
January 2026 saw a slower pace of open model releases compared to the previous year, though several noteworthy models emerged amidst anticipation for upcoming releases like DeepSeek V4 and Claude Sonnet 5. LiquidAI released LFM2.5-1.2B-Instruct, an update that significantly improved performance after pretraining on 28T tokens, outperforming Qwen3 1.6B and nearing Qwen3 4B 2507 Instruct despite being over three times smaller. Arcee-ai introduced Trinity-Large-Preview, an ultra-sparse Mixture-of-Experts (MoE) model with 400B total and 13B active parameters, accompanied by a technical report and base models. Moonshotai's Kimi-K2.5, a multimodal model continually pretrained on 15T tokens, showed strong coding and agentic abilities, with some users replacing Claude 4.5 Opus for specific tasks, though its writing capabilities reportedly suffered. Other releases included zai-org's GLM-4.7-Flash and LLM360's K2-Think-V2, a truly open reasoning model.
Key takeaway
For NLP Engineers evaluating open-source models for deployment, consider LiquidAI's LFM2.5-1.2B-Instruct as a highly efficient option that rivals larger models in performance. Its small size and strong capabilities, particularly after extensive pretraining, make it a compelling choice for resource-constrained environments or applications requiring rapid inference. You should also investigate Arcee's Trinity-Large-Preview if you need a powerful, yet parameter-efficient, MoE model for specialized tasks.
Key insights
Smaller open models are achieving competitive performance against larger counterparts through extensive pretraining and architectural innovations.
Principles
- Continual pretraining enhances model capabilities.
- Sparse MoE architectures enable large models with fewer active parameters.
Method
LiquidAI's LFM2.5-1.2B-Instruct achieved superior performance by continuing pretraining from 10T to 28T tokens, demonstrating the impact of extensive data on smaller models.
In practice
- Consider LFM2.5-1.2B-Instruct for efficient, high-performance NLP.
- Explore Trinity-Large-Preview for sparse MoE applications.
Topics
- Open Model Releases
- Mixture-of-Experts
- Multimodal AI
- Language Models
- Continual Pre-training
Code references
Best for: NLP Engineer, Computer Vision Engineer, Research Scientist, AI Engineer, Machine Learning Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Interconnects AI.