Web Search Tool with Streaming in gpt-oss-chat

· Source: DebuggerCafe · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, medium

Summary

This article details an incremental improvement to the `gpt-oss-chat` project, specifically integrating web search as an autonomous tool call capability. Instead of manual user activation, the `gpt-oss-20b` model now intelligently decides when to use web search based on the prompt and chat history, and it generates the search query. The update addresses issues like unnecessary web searches and inaccurate queries in multi-turn conversations. Key changes involve defining the web search tool in `tools.py` and significantly modifying `api_call.py` to handle streaming tokens during tool calls. This includes detecting tool call initiation, incrementally capturing arguments, and preserving dangling content for the final assistant response. The project also maintains its local RAG capabilities using in-memory Qdrant DB.

Key takeaway

For AI Engineers building conversational agents, integrating autonomous tool calling with streaming requires meticulous handling of chat history and token processing. Ensure your assistant messages explicitly declare tool intent via the `tool_calls` field, and carefully reconstruct tool arguments from the token stream. This approach allows models like `gpt-oss-20b` to dynamically enhance responses with real-time information, significantly improving user experience and accuracy in complex queries.

Key insights

LLMs can autonomously decide to use web search and generate queries, enhancing conversational AI.

Principles

Method

Implement tool calls by defining JSON schemas and Python functions, then manage streaming token detection, argument capture, and chat history updates within the chat loop.

In practice

Topics

Code references

Best for: AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by DebuggerCafe.