Interactions API: our primary interface for Gemini models and agents
Summary
Google's Interactions API, launched in public beta in December 2025, has reached General Availability on June 26, 2026, establishing itself as the primary interface for Gemini models and agents. This unified endpoint offers server-side state, background execution, tool combination, and multimodal generation. Key enhancements since its beta release include Managed Agents, which provision a remote Linux sandbox for reasoning and code execution, and improved background execution via a "background=True" setting. The API now supports advanced tool integration, allowing a mix of built-in tools like Google Search with custom functions, and enables tool results to return images alongside text. Further upgrades encompass Deep Research with new agent versions and multimodal grounding, alongside media generation capabilities for images (Nano Banana 2), music (Lyria 3), and expressive speech (multi-speaker TTS). A simplified "From Roles to Steps" schema and cost optimizations, such as Flex and Priority tiers offering up to 50% cost reduction, are also part of this release.
Key takeaway
For AI Engineers building new applications or migrating existing Gemini projects, you should prioritize adopting the Interactions API. This new standard provides a unified, agent-first interface with critical features like Managed Agents, background execution, and multimodal generation, which are essential for complex, stateful AI workflows. Transitioning now ensures access to frontier capabilities and cost optimizations, such as the Flex tier's 50% reduction, future-proofing your development efforts.
Key insights
Google's Interactions API unifies Gemini model and agent interactions, enabling complex, stateful, and multimodal AI applications.
Principles
- A unified API simplifies complex AI workflows.
- Agentic design supports stateful, long-running tasks.
- Multimodal capabilities enhance AI interaction fidelity.
Method
The API employs a "From Roles to Steps" schema, where each action is a typed step, and enables asynchronous server-side execution by setting "background=True".
In practice
- Set "background=True" for async operations.
- Integrate Google Search with custom functions.
- Utilize Flex tier for 50% cost savings.
Topics
- Interactions API
- Gemini Models
- AI Agents
- Multimodal Generation
- API Development
- Managed Agents
Best for: Machine Learning Engineer, CTO, VP of Engineering/Data, AI Engineer, Software Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Keyword.