PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend
Summary
PaddleOCR 3.5, released on May 18, 2026, integrates its OCR and document parsing models with the Hugging Face Transformers ecosystem, allowing supported PaddleOCR models like PP-OCRv5 and PaddleOCR-VL 1.5 to use Transformers as an inference backend. This update introduces a flexible `engine` parameter, enabling developers to select "transformers" as the backend and configure options like `dtype`, device placement, and attention implementation via `engine_config`. While PaddleOCR continues to manage the underlying OCR and document parsing pipelines, this integration simplifies connecting these capabilities with existing PyTorch/Transformers infrastructure, particularly for applications involving RAG, Document AI, and document agents. A live demo is available on Hugging Face Spaces.
Key takeaway
For AI Engineers building RAG, Document AI, or agent applications within a Hugging Face-centered stack, PaddleOCR 3.5 significantly reduces integration friction. You can now leverage PaddleOCR's robust OCR and document parsing capabilities, such as PP-OCRv5, directly through a familiar Transformers backend, streamlining your workflow from document ingestion to LLM processing. Consider using the `paddle_static` backend if raw throughput is your primary concern.
Key insights
PaddleOCR 3.5 enables using Hugging Face Transformers as an inference backend for its OCR and document parsing models.
Principles
- Flexible inference backends enhance developer choice.
- Integration friction impedes downstream AI workflows.
Method
Install PaddleOCR 3.5, PaddleX, Transformers, and PyTorch. Initialize PaddleOCR with `engine="transformers"` and optionally configure `engine_config` for backend-specific settings like `dtype` or `attn_implementation`.
In practice
- Use `engine="transformers"` for Hugging Face-centered stacks.
- Configure `engine_config` for `dtype` (e.g., "bfloat16").
- Prioritize `paddle_static` for maximum throughput.
Topics
- PaddleOCR 3.5
- Hugging Face Transformers
- Document Parsing
- Optical Character Recognition
- Inference Backend
Code references
Best for: AI Engineer, Machine Learning Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Hugging Face - Blog.