Thinking

· Source: Ollama Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, short

Summary

Ollama has introduced a new "thinking" feature, allowing users to enable or disable a model's internal thought process for various applications. When enabled, the output clearly separates the model's thinking steps from its final answer, which can enhance accuracy or create dynamic user experiences like animating an NPC's thought bubble. Conversely, disabling thinking provides faster, direct responses, catering to use cases where speed is paramount. This functionality is currently supported by models such as DeepSeek R1 and Qwen 3, with broader integration planned for other "thinking models." Users can control this behavior via the Ollama CLI, interactive sessions, scripting, and through updates to both the generate and chat APIs, as well as the Python and JavaScript client libraries.

Key takeaway

Ollama now enables explicit control over a model's "thinking" process, separating internal reasoning from the final output for models like DeepSeek R1 and Qwen 3. This feature, accessible via CLI, API, and client libraries, allows users to prioritize either enhanced accuracy through transparent thought steps or faster, direct responses. It offers critical flexibility for AI/ML professionals to optimize model behavior for applications ranging from complex problem-solving with animated reasoning to rapid, concise answer generation.

Topics

Code references

Best for: Machine Learning Engineer, AI Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Ollama Blog.