Getting more from each token: How Copilot improves context handling and model routing
Summary
GitHub Copilot has introduced significant enhancements to its context handling and model routing capabilities, aiming for greater efficiency in agentic workflows. Key improvements include prompt caching and deferred tool loading in VS Code, which reduce redundant information sent to the model and load tool definitions only when needed. Additionally, the "Auto" model selection feature, powered by the HyDRA routing model, dynamically chooses the optimal language model for a given task based on intent and real-time system health. This system achieves up to 72.5% cost savings while maintaining quality, outperforming other routers like OpenRouter Auto and Azure Foundry. Auto is currently live in VS Code, github.com, and mobile, with plans to expand to Copilot CLI and other IDEs, and will become the default for Free and Student plans.
Key takeaway
For AI Engineers optimizing LLM-powered developer tools, understanding Copilot's new Auto model selection and context management is crucial. You should prioritize using Auto as your default, initiate new sessions for distinct tasks, and avoid changing model settings mid-session to maximize cache efficiency. Regularly checking your AI usage page will help identify and refine cost-effective coding patterns, ensuring your credits are spent on valuable work.
Key insights
GitHub Copilot now intelligently routes tasks to optimal models and caches context for enhanced efficiency.
Principles
- Dynamic model selection optimizes cost and quality.
- Context caching reduces redundant token usage.
- Tool definitions should be loaded on demand.
Method
Auto model selection combines real-time model health (availability, utilization, cost) with task-aware routing via HyDRA, which considers reasoning depth and code complexity to choose the best-fit model.
In practice
- Start new Copilot sessions for distinct tasks.
- Compact long sessions to reset prompt prefixes.
- Enable only necessary tools for a given task.
Topics
- GitHub Copilot
- LLM Routing
- Context Management
- Prompt Caching
- HyDRA Model
- AI Efficiency
Code references
Best for: Machine Learning Engineer, NLP Engineer, CTO, Software Engineer, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The GitHub Blog.