Building a Tool-Augmented RAG Agent with Session Memory

2026-04-20 · Source: Towards AI - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, medium

Summary

This article, Part 5 of a series on production-grade RAG systems, details building a tool-augmented RAG agent with session memory. It explains how to promote a hybrid search, semantic chunking, parent-child indexing, and custom reranking pipeline into a callable tool. The process involves defining the `rag_search` function with Pydantic's `Annotated` and `Field` for machine-readable schemas, registering it with a stateful agent, and backing it with a local Llama 3.2 model via Ollama. The agent uses session memory to maintain conversation context across turns, enabling it to answer follow-up questions without re-querying the knowledge base, while also demonstrating how explicit re-queries and topic switches trigger new RAG calls. The architecture emphasizes observability, allowing direct tracing of agent answers to specific retrieved chunks.

Key takeaway

For AI Engineers building conversational RAG systems, integrating tool-augmented agents with session memory is crucial for handling multi-turn interactions. You should define your RAG pipeline as a typed tool with clear descriptions, use a local LLM like Llama 3.2 for efficient inference, and implement session memory to maintain conversational context, ensuring your agent can answer follow-up questions accurately and traceably.

Key insights

Tool-augmented RAG agents with session memory enable multi-turn, context-aware conversations by dynamically calling a knowledge base.

Principles

Tool schemas must be machine-readable.
Deterministic responses require low model temperature.
Session memory maintains conversation context.

Method

Define tools with Pydantic for schema, initialize an agent with a local LLM (e.g., Llama 3.2 via Ollama) and tools, then manage conversation state using session memory for multi-turn interactions.

In practice

Use `Annotated` and `Field` for tool parameter descriptions.
Set LLM `temperature=0.0` for retrieval-grounded agents.
Extend agents with multiple tools like `nutrition_lookup`.

Topics

Tool-Augmented RAG
Session Memory
Llama 3.2
Ollama
Pydantic

Best for: AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.