Google DeepMind has unveiled the Gemini 2.5 Computer Use model, a specialized iteration of its Gemini 2.5 Pro AI designed to interact with graphical user interfaces. Available through the Gemini API via Google AI Studio and Vertex AI Studio, this model enables AI agents to execute tasks by directly manipulating UI elements, such as filling out forms and clicking buttons. This functionality is crucial for developing robust, general-purpose AI agents.
Developers can access the model via a loop-based tool that processes user requests, live screenshots, and prior actions to generate UI responses executed by client-side code. With optimization for web and mobile UI tasks—though not yet for desktop OS operations—the model has shown strong performance in benchmarks, achieving over 70% accuracy with 225 seconds latency.
DeepMind emphasizes safety, incorporating protective measures against misuse and offering developers options to confirm high-stakes actions before execution. This makes Gemini 2.5 Computer Use a significant advancement in AI technology.
Source link