Open model Kimi K2.7 Code undercuts GPT-5.5 and Claude by up to 12x on price per token

2026-06-13 · Source: The Decoder · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Intermediate, short

Summary

Moonshot AI released Kimi K2.7 Code on June 13, 2026, an open-source, multimodal AI model specifically engineered for complex programming tasks and agent-based workflows. Available on Hugging Face, this Mixture-of-Experts model features one trillion total parameters with 32 billion active per token and a 256,000 token context length. While K2.7 Code trails Western competitors like GPT-5.5 and Claude Opus 4.8 on standard coding benchmarks (e.g., 53.6 on Program Bench compared to GPT-5.5's 69.1), it demonstrates strong performance in agent-oriented tests, even surpassing Claude Opus 4.8 on MCPMark Verified with a score of 81.1. Crucially, its API pricing is \$0.95 per million input tokens and \$4.00 per million output tokens, making it up to 12 times more cost-effective than models like Claude Fable 5. The model also includes a native INT4 quantization option and a modified MIT license for commercial use.

Key takeaway

For AI Engineers or ML Directors evaluating coding models for agent-based systems, you should benchmark Kimi K2.7 Code against your specific tasks. Its significantly lower API cost—up to 12x cheaper than alternatives like Claude Fable 5—means you can achieve substantial savings, even if raw performance slightly trails top-tier models. Prioritize task-specific evaluations to determine if its "good enough" performance at a fraction of the cost makes it your optimal choice for high-volume agentic workflows.

Key insights

Kimi K2.7 Code offers a cost-effective, agent-optimized open-source model for complex coding, despite trailing top benchmarks.

Principles

Cost-effectiveness can outweigh raw benchmark scores.
Agentic benchmarks reveal practical model strengths.
MoE architectures enable large parameter counts with efficient inference.

Method

The model uses a Mixture-of-Experts (MoE) architecture with 384 experts, activating 8 per token from one trillion total parameters, and employs "preserve_thinking" mode for agentic scenarios.

In practice

Evaluate K2.7 Code for agent-based coding workflows.
Utilize INT4 quantization for hardware efficiency.
Compare cost-per-token against raw performance for specific tasks.

Topics

Kimi K2.7 Code
Open-source Models
Agentic AI
Code Generation
API Pricing
Mixture-of-Experts

Best for: CTO, VP of Engineering/Data, AI Architect, AI Engineer, Machine Learning Engineer, Director of AI/ML

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.