RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem

· Source: Towards Data Science · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, extended

Summary

The article argues that RAG is not machine learning, asserting that applying traditional ML toolkits to RAG problems is a costly misconception. Unlike ML, where answers are predicted, RAG problems involve finding existing answers within documents. The author details how common ML practices like hyperparameter optimization (e.g., chunk size, top-k), aggregate evaluation datasets, and feature-attribution explainability are misapplied in RAG. Instead, RAG system improvement stems from engineering efforts such as better parsing, precise retrieval, and clear prompting. The piece emphasizes viewing RAG as a search engine combined with an LLM for answer generation, where the system's intelligence resides in the development team's domain expertise, not the model itself. A case study illustrates how six months of ML-focused work failed to address a fundamental parsing issue, highlighting the importance of a structural, engineering-centric approach.

Key takeaway

For AI Engineers or MLOps teams building RAG systems, recognize that RAG is an engineering assembly problem, not a model training one. Stop optimizing "hyperparameters" like chunk size with ML tools; instead, structurally design retrieval strategies based on document and question types. Focus your evaluation on specific failure modes like parsing errors or retrieval recall, rather than aggregate accuracy, to diagnose and fix issues efficiently. This approach will prevent wasted effort and build more robust systems.

Key insights

RAG is an engineering problem, not a machine learning problem, requiring search system assembly and domain expertise.

Principles

Method

Improve RAG by routing different question types to specific retrieval strategies, focusing on structural decisions over numerical optimization. Evaluate per-failure-mode metrics.

In practice

Topics

Code references

Best for: AI Engineer, MLOps Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.