A Close Look At World Model Recovery In Supervised Fine-Tuned LLM Planners
Summary
A recent study investigates how supervised fine-tuning (SFT) impacts large language models' (LLMs) ability to recover and represent "world models" for classical planning problems. Researchers devised interpretability experiments to examine both internal representations and generative capabilities of SFT-tuned LLMs. They found that fine-tuning on valid action sequences allows LLMs to linearly encode action validity and certain state predicates. Interestingly, models that struggle to classify action validity using output probabilities can still develop internal representations distinguishing valid from invalid actions. Furthermore, achieving broader state space coverage during fine-tuning, such as through random walk data, significantly enhances the accuracy of the underlying world model's recovery. This work also contributes a methodology for applying interpretability techniques specifically to planning LLMs.
Key takeaway
For Machine Learning Engineers developing LLM planners, understanding internal world model recovery is crucial. You should prioritize fine-tuning with diverse, valid action sequences and consider incorporating random walk data to achieve broader state space coverage. This approach enhances the LLM's ability to accurately represent and reason about planning problems, even if direct output probabilities are ambiguous. Focus on analyzing internal representations for deeper insights into action validity.
Key insights
SFT enables LLMs to internally represent planning world models, with broader training data improving recovery.
Principles
- SFT on valid actions encodes validity and state predicates.
- Internal representations can separate actions despite output struggles.
- Broader state space coverage improves world model recovery.
Method
A series of interpretability experiments were devised to holistically interrogate world model recovery by examining internal representations and generative capabilities of fine-tuned LLMs.
In practice
- Use valid action sequences for SFT to encode planning logic.
- Incorporate random walk data for broader state space coverage.
- Analyze internal representations for action validity insights.
Topics
- Large Language Models
- Supervised Fine-Tuning
- Classical Planning
- World Models
- LLM Interpretability
- State Space Coverage
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.