Google has launched the Gemini 2.5 Computer Use model, an advanced AI designed to navigate the web like human users. This new iteration, an enhancement of Gemini 2.5 Pro, enables AI agents to interact directly with user interfaces (UIs), performing actions such as clicking, typing, and scrolling. The model excels in web and mobile control tests, offering low latency and superior performance compared to competitors. Accessible through the Gemini API in Google AI Studio and Vertex AI, it integrates user requests, screenshots, and action history to generate appropriate responses for UI tasks. This innovative capability allows for more seamless interactions in managing tasks like filling forms and submitting data. While currently optimized for web browsers, the Gemini 2.5 model shows potential for mobile UI applications but is not equipped for desktop OS control at this stage. As part of Google’s ongoing AI advancements, it represents a significant leap in user interaction technology.
Source link

Share
Read more