Build 100% Local Advanced RAG System for Financial PDFs with Qwen 3.5 | Docling, LangGraph & Ollama

· Source: Venelin Valkov · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, long

Summary

This content details the construction of a completely local Retrieval Augmented Generation (RAG) system designed to analyze financial documents. The system leverages DocLink for PDF processing, Ollama for local LLM inference, LangChain, and LangGraph for workflow orchestration. It features a Streamlit front end for document upload and chat, communicating with a FastAPI backend that manages ingestion, retrieval, and inference layers. The ingestion pipeline uses a custom chunker and stores document chunks in a Superbase database with PGVector for embeddings. An inference layer hosts LLMs (e.g., Ollama), while a retrieval layer employs query expansion and FlashRank for reranking. The entire architecture is designed to run locally, enabling private and offline document analysis, demonstrated with Apple and Nvidia earnings reports.

Key takeaway

For AI Engineers building secure, offline document analysis solutions, this local RAG architecture provides a robust blueprint. You should consider integrating DocLink for accurate PDF parsing and Superbase with PGVector for efficient chunk storage and retrieval. The use of Ollama for local LLMs and LangGraph for workflow orchestration offers flexibility and control over the entire pipeline, ensuring sensitive financial data remains within your local environment.

Key insights

A local RAG system can analyze financial PDFs using open-source tools for privacy and offline capability.

Principles

Method

The system processes PDFs into markdown, chunks them, enriches chunks with an LLM, and stores them in a PGVector-enabled Superbase database. Queries undergo expansion, hybrid retrieval, and reranking via a LangGraph workflow.

In practice

Topics

Best for: AI Engineer, Software Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Venelin Valkov.