Awakening the Sleeping Agent: Lean-Specific Agentic Data Reactivates General Tool Use in Goedel Prover

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, quick

Summary

Goedel-Prover-V2, an open-source model extensively trained on 1.8 million formal-math examples, exhibits a significant loss of general tool-use capabilities after domain specialization, with function-calling accuracy plummeting from 89.4% to nearly 0%. Researchers investigated whether this "agentic collapse" is reversible. They found that fine-tuning the specialized model with as few as 100 Lean-specific tool-use traces was sufficient to restore robust tool-calling behavior. This recovery was not domain-specific; the regained capability transferred effectively, improving performance on the Berkeley Function Calling Leaderboard from near zero to 83.8%, close to the base model's 89.4%. Additionally, on ProofNet, pass@32 improved from 21.51% to 25.81%, demonstrating practical utility within the domain.

Key takeaway

For AI Scientists and Machine Learning Engineers specializing models, be aware that heavy supervised fine-tuning can suppress general capabilities like tool use. If your specialized model shows reduced function-calling accuracy, consider fine-tuning with a small, domain-specific agentic dataset. This approach can reactivate dormant general abilities, potentially improving performance across diverse tasks without extensive retraining.

Key insights

Domain specialization can suppress general tool-use in models, but small amounts of agentic data can reactivate it.

Principles

Method

Fine-tuning a specialized model with a small dataset of domain-specific agentic traces (e.g., 100 Lean-specific traces) can restore general tool-use abilities.

In practice

Topics

Best for: AI Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.