Cloud models
Summary
Ollama has launched a preview of its new cloud models, enabling users to run larger language models that typically exceed personal computer hardware capabilities. This service, released on September 19, 2025, integrates seamlessly with existing local Ollama tools and maintains user privacy by not retaining data. The cloud models are accessible via Ollama's OpenAI-compatible API and support standard Ollama commands like `run`, `pull`, `ls`, and `cp`. Initial available models include `qwen3-coder:480b-cloud`, `gpt-oss:120b-cloud`, `gpt-oss:20b-cloud`, and `deepseek-v3.1:671b-cloud`. Users need to download Ollama v0.12 and sign in to ollama.com to utilize these datacenter-grade hardware resources.
Key takeaway
For AI Engineers and Machine Learning Engineers needing to deploy or experiment with large language models, Ollama's new cloud models offer a practical solution. You can now access models up to 671B parameters without local hardware constraints, while preserving your existing Ollama workflows and ensuring data privacy. Consider integrating these cloud models to scale your LLM applications or research efforts efficiently.
Key insights
Ollama's cloud models allow running large language models with local tools and API compatibility, ensuring data privacy.
Principles
- Maintain local tool compatibility
- Prioritize user data privacy
- Scale model access via cloud
Method
Download Ollama v0.12, sign in via `ollama signin`, then use `ollama run` or `ollama pull` for cloud models, integrating with existing Ollama CLI and API workflows.
In practice
- Run `ollama run qwen3-coder:480b-cloud`
- Integrate with Python or JavaScript libraries
- Access via OpenAI-compatible API
Topics
- Cloud AI Models
- Ollama Platform
- Large Language Models
- Model Deployment
- API Integration
Best for: AI Engineer, Machine Learning Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Ollama Blog.