ProfiliTable: Profiling-Driven Tabular Data Processing via Agentic Workflows
Summary
ProfiliTable is an autonomous multi-agent framework designed to enhance tabular data processing by overcoming limitations of current LLM-based approaches, such as ambiguous instructions and semantically flawed code. It employs dynamic profiling to iteratively refine a unified execution context. The framework integrates a Profiler for ReAct-style data exploration, a Generator that synthesizes code using retrieved operators, and an Evaluator–Summarizer loop for feedback-driven refinement. Tested on a benchmark of 18 tabular task types, ProfiliTable consistently outperforms strong baselines, achieving state-of-the-art accuracy in both single-step and complex multi-step scenarios. Notably, it maintains a 100% task-wise runnable rate with GPT-4o and GPT-5.2, demonstrating high fidelity and cost efficiency, consuming an average of 24,907 tokens for single-step tasks with GPT-5.2.
Key takeaway
For Data Scientists and Machine Learning Engineers automating tabular data processing, especially complex multi-step transformations, you should prioritize solutions incorporating dynamic profiling and iterative feedback. Generic LLM-based tools often produce semantically flawed code due to ambiguous instructions; ProfiliTable demonstrates that a multi-agent framework with active data exploration and closed-loop refinement ensures 100% runnable code and superior accuracy, even with models like GPT-4o and GPT-5.2, while maintaining cost efficiency.
Key insights
Robust table processing relies on active, iterative profiling and feedback-driven refinement to bridge linguistic intent with tabular reality.
Principles
- Dynamic profiling is crucial for reliably translating ambiguous user intents into robust table transformations.
- Iterative feedback loops are essential for correcting cascading failures in complex multi-step workflows.
- Focused retrieval of pre-validated operator templates helps suppress hallucination and improve code correctness.
Method
ProfiliTable employs an Interpreter, Profiler, Decompositer, Generator, Evaluator, Summarizer, and Finalizer in a closed-loop, feedback-driven system for iterative refinement of code generation.
In practice
- Actively interrogate tables via ReAct loops to formulate and test data hypotheses.
- Decompose complex instructions into ordered subtasks for targeted operator retrieval.
- Compute task-specific scores and diagnostic insights for closed-loop refinement.
Topics
- Tabular Data Processing
- Multi-Agent Systems
- Dynamic Profiling
- LLM Agents
- Retrieval-Augmented Generation
- Data Wrangling
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.