Getting more from each token: How Copilot improves context handling and model routing

2026-06-17 · Source: The GitHub Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, long

Summary

GitHub Copilot has introduced significant enhancements to its context handling and model routing capabilities, aiming for greater efficiency in agentic workflows. Key improvements include prompt caching and deferred tool loading in VS Code, which reduce redundant information sent to the model and load tool definitions only when needed. Additionally, the "Auto" model selection feature, powered by the HyDRA routing model, dynamically chooses the optimal language model for a given task based on intent and real-time system health. This system achieves up to 72.5% cost savings while maintaining quality, outperforming other routers like OpenRouter Auto and Azure Foundry. Auto is currently live in VS Code, github.com, and mobile, with plans to expand to Copilot CLI and other IDEs, and will become the default for Free and Student plans.

Key takeaway

For AI Engineers optimizing LLM-powered developer tools, understanding Copilot's new Auto model selection and context management is crucial. You should prioritize using Auto as your default, initiate new sessions for distinct tasks, and avoid changing model settings mid-session to maximize cache efficiency. Regularly checking your AI usage page will help identify and refine cost-effective coding patterns, ensuring your credits are spent on valuable work.

Key insights

GitHub Copilot now intelligently routes tasks to optimal models and caches context for enhanced efficiency.

Principles

Dynamic model selection optimizes cost and quality.
Context caching reduces redundant token usage.
Tool definitions should be loaded on demand.

Method

Auto model selection combines real-time model health (availability, utilization, cost) with task-aware routing via HyDRA, which considers reasoning depth and code complexity to choose the best-fit model.

In practice

Start new Copilot sessions for distinct tasks.
Compact long sessions to reset prompt prefixes.
Enable only necessary tools for a given task.

Topics

GitHub Copilot
LLM Routing
Context Management
Prompt Caching
HyDRA Model
AI Efficiency

Code references

Best for: Machine Learning Engineer, NLP Engineer, CTO, Software Engineer, AI Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The GitHub Blog.