RAG Tool Call for gpt-oss-chat
Summary
This article details the extension of "gpt-oss-chat" to include local Retrieval Augmented Generation (RAG) as a tool call, building on previous work that established the chat's base and integrated web search. The update allows the assistant to intelligently decide when to use local document search based on user queries and chat history, rather than performing RAG by default. Key changes involve updating the system prompt to recognize the new `local_rag` tool, modifying the `tools/tools.py` module to define and handle the `local_rag` function, and adjusting the `api_call.py` script to manage multiple tool choices and stream responses. The system uses a `llama.cpp` server with a 32000 context length and supports PDF ingestion for in-memory Qdrant vector database creation. This enhancement aims to optimize context usage and response time by enabling the model to act as a router for available tools.
Key takeaway
For AI Engineers building advanced chatbots, integrating RAG as a dynamic tool call significantly enhances efficiency and relevance. Your models can intelligently route queries to local documents or web search, reducing unnecessary processing and improving response quality. Consider implementing a modular tool management system to scale as more capabilities are added, ensuring the assistant can always choose the optimal resource for a given user interaction.
Key insights
Integrating RAG as a tool call allows LLMs to dynamically select relevant information sources, optimizing context and response.
Principles
- Chatbots benefit from multiple, selectable tools.
- Dynamic tool selection saves context and time.
- Assistant acts as a router for available tools.
Method
The method involves updating system prompts, defining tool functions in `tools.py` (e.g., `local_rag`), and modifying the chat loop in `api_call.py` to process tool calls from the event stream.
In practice
- Use `--rag-tool <path/to/document.pdf>` to enable RAG tool.
- Run `llama-server -hf ggml-org/gpt-oss-20b-GGUF -c 32000` for server.
- Define tool parameters like `top_k` and `topic` for RAG.
Topics
- Retrieval-Augmented Generation
- Tool Calling
- LLM Agents
- gpt-oss-chat
- Qdrant
Code references
Best for: AI Engineer, Machine Learning Engineer, AI Chatbot Developer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by DebuggerCafe.