Local LLMs Need More Than OpenAI-Compatible Endpoints
Summary
Respawn is an open-source local OpenAI-shaped API gateway designed to bridge the gap between local LLM inference backends like Ollama and the comprehensive platform features expected by modern clients. While local backends excel at token generation, they often lack capabilities such as stored response objects, "previous_response_id" for conversation continuity, normalized streaming events, tool-call protocol handling, file and image inputs, and background jobs. Respawn sits in front of these backends, providing a /v1 API surface that supports blocking, streaming, and background response flows, along with lifecycle endpoints for managing responses. It stores state in Postgres or SQLite, offers extensive observability metrics via VictoriaMetrics and Grafana, and has been tested with the OpenAI Python SDK and Codex locally, demonstrating its ability to integrate local models into complex software systems.
Key takeaway
For MLOps Engineers or Software Engineers integrating local LLMs into agents or internal services, relying solely on basic OpenAI-compatible endpoints is insufficient for robust applications. You should consider implementing a dedicated API gateway like Respawn to provide stateful API behavior, normalized streaming, and comprehensive observability. This approach ensures your local LLM stack meets modern client expectations, simplifies debugging, and allows for independent testing of API compatibility and inference performance.
Key insights
Local LLM platforms need a dedicated API gateway for stateful, OpenAI-compatible behavior beyond basic inference.
Principles
- Separate API gateway concerns from inference backend concerns.
- API compatibility requires explicit testing, not just "vibes".
- Preserve tool protocol shape; client owns function execution.
Method
Respawn acts as a gateway, intercepting OpenAI SDK requests, managing state (e.g., "previous_response_id"), normalizing streaming, and forwarding generation requests to local LLM backends like Ollama.
In practice
- Use Respawn to add stateful OpenAI API features to local Ollama deployments.
- Integrate local LLMs into agents and internal services expecting full API contracts.
- Leverage Respawn's observability for local LLM development workflows.
Topics
- Local LLMs
- API Gateway
- OpenAI API
- Ollama
- LLM Observability
- Tool Calling Protocol
Code references
Best for: AI Architect, Machine Learning Engineer, NLP Engineer, AI Engineer, Software Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by HackerNoon.