ProfiliTable: Profiling-Driven Tabular Data Processing via Agentic Workflows

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

ProfiliTable is an autonomous multi-agent framework designed to enhance tabular data processing by overcoming limitations of current LLM-based approaches, such as ambiguous instructions and semantically flawed code. It employs dynamic profiling to iteratively refine a unified execution context. The framework integrates a Profiler for ReAct-style data exploration, a Generator that synthesizes code using retrieved operators, and an Evaluator–Summarizer loop for feedback-driven refinement. Tested on a benchmark of 18 tabular task types, ProfiliTable consistently outperforms strong baselines, achieving state-of-the-art accuracy in both single-step and complex multi-step scenarios. Notably, it maintains a 100% task-wise runnable rate with GPT-4o and GPT-5.2, demonstrating high fidelity and cost efficiency, consuming an average of 24,907 tokens for single-step tasks with GPT-5.2.

Key takeaway

For Data Scientists and Machine Learning Engineers automating tabular data processing, especially complex multi-step transformations, you should prioritize solutions incorporating dynamic profiling and iterative feedback. Generic LLM-based tools often produce semantically flawed code due to ambiguous instructions; ProfiliTable demonstrates that a multi-agent framework with active data exploration and closed-loop refinement ensures 100% runnable code and superior accuracy, even with models like GPT-4o and GPT-5.2, while maintaining cost efficiency.

Key insights

Robust table processing relies on active, iterative profiling and feedback-driven refinement to bridge linguistic intent with tabular reality.

Principles

Dynamic profiling is crucial for reliably translating ambiguous user intents into robust table transformations.
Iterative feedback loops are essential for correcting cascading failures in complex multi-step workflows.
Focused retrieval of pre-validated operator templates helps suppress hallucination and improve code correctness.

Method

ProfiliTable employs an Interpreter, Profiler, Decompositer, Generator, Evaluator, Summarizer, and Finalizer in a closed-loop, feedback-driven system for iterative refinement of code generation.

In practice

Actively interrogate tables via ReAct loops to formulate and test data hypotheses.
Decompose complex instructions into ordered subtasks for targeted operator retrieval.
Compute task-specific scores and diagnostic insights for closed-loop refinement.

Topics

Tabular Data Processing
Multi-Agent Systems
Dynamic Profiling
LLM Agents
Retrieval-Augmented Generation
Data Wrangling

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.