SAFE-Cascade: Cost-Adaptive Vision-Language Routing for Chart Question Answering

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

SAFE-Cascade is an interactive system designed for cost-adaptive chart question answering, addressing the expense of invoking vision-language models (VLMs) for every query. The system first extracts chart text using Azure Document Intelligence for OCR, then obtains a provisional answer from a text-only language model, gpt-5-mini. A Random Forest router, trained on inference-time features, subsequently decides whether to accept this text answer or escalate the query to a VLM, gemini-2.5-flash-image. On a 375-example ChartQA test split, SAFE-Cascade achieved 69.1% unified accuracy with 73.1% VLM invocation. This performance matches a full-VLM baseline (67.7% accuracy, 100% VLM invocation) while reducing VLM calls by 26.9% and estimated cost by 9.3%. The system also offers a transparent user interface, displaying OCR evidence, routing probability, and allowing users to adjust the escalation threshold.

Key takeaway

For AI Architects designing chart question answering systems, you should consider implementing a cost-adaptive routing mechanism. This approach, demonstrated by SAFE-Cascade, allows you to maintain accuracy comparable to full VLM invocation while significantly reducing operational expenses by selectively engaging expensive models. Evaluate your current VLM usage and explore multi-stage pipelines to optimize resource allocation and enhance system transparency.

Key insights

Selective modality routing can match VLM performance while significantly reducing operational costs and increasing transparency.

Principles

Method

The system extracts chart text via OCR, obtains a provisional answer from a text-only LM, then uses a Random Forest router to decide VLM escalation.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.