OpenAI gpt-oss
Summary
OpenAI has released two new open-weight models, gpt-oss 20B and gpt-oss 120B, in partnership with Ollama, making them available for local deployment as of August 5, 2025. These models are designed for powerful reasoning, agentic tasks, and versatile developer use cases, featuring native function calling, web browsing, Python tool calls, and structured outputs. They support full chain-of-thought access and configurable reasoning effort (low, medium, high). The models are fine-tunable and released under the permissive Apache 2.0 license. To reduce memory footprint, OpenAI utilizes MXFP4 quantization for the mixture-of-experts (MoE) weights, which constitute over 90% of parameters, enabling the 20B model to run on 16GB memory and the 120B model on a single 80GB GPU. Ollama natively supports this MXFP4 format, with new kernels developed for its engine, and has collaborated with NVIDIA to accelerate gpt-oss performance on GeForce RTX and RTX PRO GPUs.
Key takeaway
For AI/ML Directors evaluating new local deployment options, OpenAI's gpt-oss models offer a compelling solution due to their agentic capabilities, Apache 2.0 license, and efficient MXFP4 quantization. You should consider integrating these models, especially the 20B version for specialized tasks or the 120B for general-purpose production, to leverage powerful reasoning on existing NVIDIA RTX hardware. This partnership with Ollama and NVIDIA simplifies deployment and ensures performance.
Key insights
OpenAI's gpt-oss models offer powerful local AI with agentic features and efficient quantization under an Apache 2.0 license.
Principles
- Quantization significantly reduces memory footprint.
- Open-source models foster broad utility and customization.
Method
The gpt-oss models use MXFP4 quantization for mixture-of-experts weights, reducing memory to enable local execution on GPUs with 16GB or 80GB memory, supported natively by Ollama's new engine.
In practice
- Use `ollama run gpt-oss:20b` for lower latency tasks.
- Employ `ollama run gpt-oss:120b` for high reasoning production use.
Topics
- OpenAI gpt-oss
- Large Language Models
- Model Quantization
- Agentic AI
- Ollama Platform
Best for: CTO, VP of Engineering/Data, Director of AI/ML, Machine Learning Engineer, AI Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Ollama Blog.