Wednesday, July 23, 2025

Google’s Gemini 2.5 Introduces Innovative Support for Conversational Image Segmentation

Google’s Gemini 2.5 AI model now features “conversational image segmentation,” allowing users to analyze images using natural language prompts. Unlike traditional image segmentation which identifies fixed categories, Gemini interprets complex queries like “the person with the umbrella” or abstract concepts like “clutter.” This innovative capability also includes built-in text recognition, enabling it to identify items like “pistachio baklava” in images. This technology is particularly beneficial in various fields: designers can select image areas using verbal commands, workplace safety monitoring can highlight violations, and insurance adjusters can tag storm-damaged homes in aerial shots. Accessible via the Gemini API, the results are delivered in JSON format, detailing selected image coordinates and labels. For optimal efficiency, Google recommends using the gemini-2.5-flash model. Initial testing is available through Google AI Studio or Python Colab. This feature marks a significant advancement in AI-driven image analysis and editing.

Source link

Share

Read more

Local News