Build an Agent That Thinks Like a Data Scientist: How We Hit #1 on DABStep with Reusable Tool Generation
Summary
The NVIDIA KGMON (NeMo Agent Toolkit) Data Explorer is an autonomous data analysis agent architecture designed by the NVIDIA Kaggle Grandmasters (KGMON) LLM Agent Research Team. Published March 13, 2026, it specializes in dataset exploration and analysis, handling multi-step reasoning, tool calling, and iterative data analysis. The system achieved first place on the Data Agent Benchmark for Multi-step Reasoning (DABStep) benchmark, demonstrating a 30x speedup over the Claude Code baseline. Its multi-phase approach separates foundational knowledge building from rapid inference, focusing on open-ended exploratory data analysis (EDA) using a ReAct agent with a Jupyter Notebook tool, and multi-step rule-based tabular data QA using a Tool Calling Agent with specialized tools. This architecture significantly outperforms other solutions like AntGroup's DataPilot and Google AI's DS-STAR on complex tasks.
Key takeaway
For AI Architects and Research Scientists building data analysis agents, consider adopting a multi-phase architecture that separates tool generation from inference. Your teams can achieve significant speedups and superior performance on complex, multi-step tabular data tasks by investing upfront in a learning loop to create reusable, generalized functions, allowing lightweight models to execute rapidly and efficiently.
Key insights
Separating knowledge building from inference via reusable tool generation dramatically improves agent performance and efficiency.
Principles
- Complex data questions share foundational operations.
- Iterative testing refines generalized functions.
- Offline reflection enhances live inference.
Method
A three-phase approach: a Learning Loop generates reusable tools, a Fast Inference phase applies them, and an Unsupervised Offline Reflection phase refines insights for future inference.
In practice
- Use a heavyweight model for initial tool generation.
- Employ a lightweight model for rapid inference.
- Integrate reflection and group-consistency for quality control.
Topics
- Data Analysis Agents
- DABStep Benchmark
- LLM Agent Architectures
- Exploratory Data Analysis
- Multi-step Reasoning
Code references
Best for: AI Architect, AI Scientist, Research Scientist, AI Engineer, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Hugging Face - Blog.