Web Search Tool with Streaming in gpt-oss-chat
Summary
This article details an incremental improvement to the `gpt-oss-chat` project, specifically integrating web search as an autonomous tool call capability. Instead of manual user activation, the `gpt-oss-20b` model now intelligently decides when to use web search based on the prompt and chat history, and it generates the search query. The update addresses issues like unnecessary web searches and inaccurate queries in multi-turn conversations. Key changes involve defining the web search tool in `tools.py` and significantly modifying `api_call.py` to handle streaming tokens during tool calls. This includes detecting tool call initiation, incrementally capturing arguments, and preserving dangling content for the final assistant response. The project also maintains its local RAG capabilities using in-memory Qdrant DB.
Key takeaway
For AI Engineers building conversational agents, integrating autonomous tool calling with streaming requires meticulous handling of chat history and token processing. Ensure your assistant messages explicitly declare tool intent via the `tool_calls` field, and carefully reconstruct tool arguments from the token stream. This approach allows models like `gpt-oss-20b` to dynamically enhance responses with real-time information, significantly improving user experience and accuracy in complex queries.
Key insights
LLMs can autonomously decide to use web search and generate queries, enhancing conversational AI.
Principles
- Tool calls require explicit `tool_calls` in assistant messages.
- Chat history must correctly reflect `system`, `user`, `assistant`, and `tool` roles.
Method
Implement tool calls by defining JSON schemas and Python functions, then manage streaming token detection, argument capture, and chat history updates within the chat loop.
In practice
- Define tool functions in `tools.py` with JSON schema.
- Handle streaming tool arguments token-by-token.
- Append assistant messages with `tool_calls` before tool results.
Topics
- gpt-oss-chat
- Tool Calling
- Web Search Integration
- Streaming API
- Chat History Management
Code references
Best for: AI Engineer, Machine Learning Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by DebuggerCafe.