Google Shifts Gemini to Interactions API to Power the New Era of Agents

Google DeepMind has officially designated the Interactions API as the default interface for all Gemini models and agents, marking a fundamental shift in how developers build with Google's AI. By replacing the legacy generateContent interface, Google is pivoting from simple text-in/text-out interactions toward a complex, multi-step framework designed specifically for autonomous agency.

Moving Beyond Simple Chat to Autonomous Agents

For much of the generative AI era, developers relied on the generateContent method, which was optimized for stateless, single-turn responses. The transition to the Interactions API signifies Google's commitment to "Agentic AI"—systems that don't just talk, but act.

According to Logan Kilpatrick, Google’s developer relations lead, this API "sets the stage for the new era of Agents." The shift allows for features that were previously difficult to implement, such as Managed Agents equipped with their own Linux sandboxes. This enables models to execute code in secure, isolated environments, making them capable of performing complex computational tasks rather than just predicting the next token.

Advanced Capabilities: Tool Chaining and Background Execution

The Interactions API introduces a suite of high-level capabilities that transform Gemini from a chatbot into a functional assistant. Key technical enhancements include:

  • Tool Chaining: Seamless integration with Google Search and Google Maps allows agents to ground their actions in real-world data.
  • Long-running Tasks: The API supports background execution, allowing agents to work on complex workflows without requiring a constant, active connection from the client.
  • Multimodal Generation: Developers can now orchestrate the generation of images, music, and speech directly through the agentic workflow.
  • State Management: The API handles the complexity of multi-step reasoning, allowing agents to maintain context across diverse tool uses and external calls.

A Simplified Schema and Optimized Execution Modes

Google has also streamlined the technical architecture of the API to make it more intuitive for developers. The traditional role-based structure (using labels like "user" and "model") has been replaced by a system of typed "steps." In this new schema, every discrete action—from a user prompt to a function call and a subsequent tool response—is treated as a defined step in a sequence.

To address the economic and performance needs of different applications, Google has introduced two distinct execution modes:

  • Flex Mode: Optimized for cost-efficiency, offering a 50 percent reduction in expenses for developers running large-scale or non-urgent tasks.
  • Priority Mode: Optimized for low latency, ensuring that speed-critical applications receive the fastest possible inference.

Why This Matters for the AI Ecosystem

This move signals that the industry is moving past the "chatbot" phase and into the "agent" phase. By standardizing on an API built for tool use, sandboxed execution, and long-running processes, Google is providing the infrastructure necessary for autonomous software that can navigate the web, manage files, and execute code. For developers, this means less time spent managing state and more time building complex, reliable AI workflows.

Key Takeaways

  • API Transition: The Interactions API replaces generateContent as the default for Gemini, enabling advanced agentic features like Linux sandboxing and tool chaining.
  • New Execution Modes: Developers can now choose between Flex mode (50% cost savings) and Priority mode (optimized for speed).
  • Structural Shift: The API moves from a "user/model" role structure to a "typed steps" schema, better reflecting the multi-step nature of autonomous agents.