Convert Any Document To LLM Knowledge with Docling & Ollama (100% Local) | PDF to Markdown Pipeline

2025-12-22 · Source: Venelin Valkov · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, medium

Summary

A local, open-source pipeline is presented for converting complex PDF documents, including those with images and tables, into high-quality markdown files suitable for Retrieval Augmented Generation (RAG) applications. The pipeline leverages IBM's open-source Docklink library for document processing and OAMA for visual language model (VLM) integration. Specifically, it uses PyPDFM2 for digital PDF text extraction, a Table Transformer model for accurate table recognition, and a local instance of the Quantry VL 2 billion parameter model via OAMA for generating image descriptions. This entirely local setup ensures that financial documents, often rich in tables and charts, are accurately parsed, with images replaced by descriptive annotations, and page breaks preserved for subsequent RAG chunking.

Key takeaway

For AI Engineers building RAG applications that rely on external documents, implementing a local PDF-to-markdown conversion pipeline using Docklink and OAMA can significantly improve knowledge base quality. This approach ensures accurate extraction of text and tables from digital PDFs and enriches image content with VLM-generated descriptions, directly enhancing the relevance and accuracy of your RAG system's responses. Consider configuring the Quantry VL model for robust image annotation.

Key insights

A local pipeline converts complex PDFs to markdown with tables and image descriptions for RAG applications.

Principles

Prioritize digital PDF readers over OCR for accuracy.
Use specialized models for table structure recognition.
Replace images with VLM-generated descriptions for RAG.

Method

The pipeline uses Docklink's Document Converter with PyPDFM2 for PDF reading, a Table Transformer for table extraction, and an OAMA-hosted Quantry VL model for image description, outputting a markdown file with annotations.

In practice

Integrate Docklink and OAMA for local PDF processing.
Configure Quantry VL for image description generation.
Preserve page breaks for effective RAG chunking.

Topics

Document Processing
RAG Applications
Visual Language Models
Table Extraction
Docklink Library

Best for: AI Engineer, Machine Learning Engineer, AI Chatbot Developer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Venelin Valkov.