OpenAI compatibility

· Source: Ollama Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, quick

Summary

Ollama, a platform for running large language models locally, announced built-in compatibility with the OpenAI Chat Completions API on February 8, 2024. This integration allows users to leverage existing OpenAI tooling and applications with local Ollama models like Llama 2 or Mistral. Setup involves downloading Ollama and pulling a model, then invoking the API endpoint by changing the hostname to `http://localhost:11434` in cURL requests. The compatibility extends to the OpenAI Python and JavaScript libraries, requiring only a `base_url` modification and a placeholder `api_key`. Practical examples demonstrate integration with the Vercel AI SDK for conversational streaming applications and Microsoft's Autogen framework for multi-agent systems, using models like Code Llama. Future enhancements may include Embeddings API, function calling, vision support, and logprobs.

Key takeaway

For AI Engineers building or testing applications that rely on the OpenAI Chat Completions API, you can now seamlessly switch to local Ollama models like Llama 2 or Mistral by simply reconfiguring the API base URL. This enables local development and testing without external API calls, reducing costs and improving privacy, especially when iterating on multi-agent systems or conversational UIs.

Key insights

Ollama now supports the OpenAI Chat Completions API, enabling local LLM use with existing OpenAI-compatible tools.

Principles

Method

Configure OpenAI client libraries or cURL requests to point to `http://localhost:11434/v1` and use 'ollama' as the API key to interact with local Ollama models.

In practice

Topics

Code references

Best for: AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Ollama Blog.