Gemini 3.5 Flash Now Has Native Computer Use
Google updated Gemini 3.5 Flash on June 24, 2026. It now includes native computer use. This means the model can interact with screens directly.
Before this update, developers faced a choice. You had to use a separate model for screen control or build complex pipelines between different models. This added cost and engineering work.
Now, computer use is a standard tool. You can call it alongside Search and Maps in a single step.
What changes for you:
- Single inference pass: One agent can browse the web, use enterprise apps, and check Maps without switching models.
- Larger context: The window grew from 128K to 1 million tokens. This helps with long tasks.
- Better reasoning: Every action now includes an intent field. It explains why the model clicked or typed. This creates an audit trail for compliance.
- Lower costs: Gemini 3.5 Flash costs $1.50 per million input tokens. GPT-5.5 costs $5.00. Gemini is much cheaper for scaling.
How it works:
- Your app takes a screenshot.
- The API receives the image and your goal.
- The model picks a UI element and returns a command like a click or a scroll.
- Your app executes the command and repeats the process.
Safety is a major concern. An agent can perform irreversible actions like sending emails or making payments. Google added layers to manage this:
- Adversarial training to stop prompt injection.
- Human confirmation for sensitive actions.
- Seven safety categories to block specific tasks like financial moves.
The model supports over 20 action types. This includes clicks, typing, scrolling, and dragging for browsers, mobile, and desktop.
The gap between benchmarks and real-world use remains. Apps change often and authentication flows are tricky. Start with read-only tasks. Once you trust the logs, move to workflows that require human approval.
Computer use is moving from a premium add-on to a standard tool.
Optional learning community: https://t.me/GyaanSetuAi
