GUIDE: Guided Updates for In-context Decision Evolution in LLM-Driven Spacecraft Operations
Summary
GUIDE (Guided Updates for In-context Decision Evolution) is a non-parametric policy improvement framework designed for LLM-driven spacecraft operations, addressing the limitations of static prompting in dynamic environments. It enables cross-episode adaptation without requiring model weight updates by evolving a structured, state-conditioned "playbook" of natural-language decision rules. A lightweight acting model handles real-time control, while an offline reflection process updates the playbook based on prior mission trajectories. Evaluated in an adversarial orbital interception task within the Kerbal Space Program Differential Games environment, GUIDE consistently outperformed static baselines. The framework demonstrates that context evolution in LLM agents can function as a policy search mechanism over structured decision rules for real-time, closed-loop spacecraft interaction, particularly in scenarios requiring adaptive reasoning under uncertainty.
Key takeaway
For research scientists developing autonomous agents for dynamic, real-time control systems, GUIDE offers a compelling alternative to traditional weight-update learning. You should consider implementing a "teacher-student" architecture where a lightweight online agent is guided by an evolving, natural-language playbook. This approach allows for continuous adaptation to unpredictable environments, like adversarial space operations, without the computational burden or deployment constraints of retraining large models.
Key insights
GUIDE enables LLMs to adapt in real-time by evolving natural-language decision rules without weight updates.
Principles
- Separate online execution from offline policy improvement.
- Context evolution can serve as a learnable policy object.
- Structured natural language rules can encode adaptive behavior.
Method
GUIDE uses a "teacher-student" approach: a fixed acting model executes real-time control based on a dynamic playbook, while an offline meta-reasoning LLM (Reflector/Curator) updates this playbook via ADD/UPDATE/REMOVE operations using $\epsilon$-biased reflection sampling from past trajectories.
In practice
- Use a playbook of state-conditioned rules for LLM adaptation.
- Implement a two-tiered guard-avoidance regime for spacecraft.
- Apply UCB1 for selecting among multiple playbook versions.
Topics
- GUIDE Framework
- LLM Spacecraft Operations
- In-context Policy Evolution
- Natural Language Playbook
- Kerbal Space Program
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.