Multi-Turn Tool Call with gpt-oss-chat

· Source: DebuggerCafe · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, long

Summary

This article details the implementation of multi-turn tool calling capabilities within the `gpt-oss-chat` application, enabling a local AI assistant to autonomously utilize multiple tools like web search and local RAG (Retrieval Augmented Generation) within a single user turn. The system extends `gpt-oss-chat` to allow the assistant to make successive internal tool calls, such as searching user-uploaded documents and then the web, to synthesize comprehensive responses. The implementation involves minor logical changes in `api_call.py` and a modified system prompt in `utils/prompt.py`, which guides the assistant on when and how to use tools, including a `MAX_TOOL_CALLS` limit of 5 to prevent infinite loops. The `gpt-oss-20b` model is served using `llama.cpp` with a 32000 context length, demonstrating the workflow with a PDF document and web search APIs like Tavily and Perplexity.

Key takeaway

For AI Engineers developing local chat applications, implementing multi-turn tool calls significantly enhances assistant capabilities. You should focus on refining system prompts to guide tool selection and manage tool call limits to ensure efficient and relevant information retrieval. This approach allows your assistant to autonomously combine document retrieval and web search, providing more comprehensive answers to complex user queries.

Key insights

Local AI assistants can achieve multi-turn tool calling for complex queries by integrating web search and RAG.

Principles

Method

The method involves modifying the system prompt to guide tool usage, implementing a `MAX_TOOL_CALLS` limit, and iteratively calling tools based on stream events until a final response is generated or the limit is reached.

In practice

Topics

Code references

Best for: AI Engineer, Machine Learning Engineer, AI Chatbot Developer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by DebuggerCafe.