Recent research published in Patterns highlights limitations in AI image generation, revealing that models like Stable Diffusion XL and LLaVA default to a few generic motifs despite vast visual datasets. In an experiment mimicking a visual telephone game, Stable Diffusion XL was prompted to generate images, which were then described by LLaVA and fed back into the system, continuing for 100 rounds. The findings showed that most image outputs converged into just 12 common styles, described as “visual elevator music,” with typical scenes including lighthouses, rustic architecture, and urban nightscapes. Researchers noted that even when the models were changed, the same pattern persisted, indicating a lack of true creativity in AI. Unlike humans, whose varied interpretations lead to significant differences, AI consistently reverts to familiar styles, suggesting that while generating content is feasible, instilling genuine aesthetic qualities remains a challenge. This underscores the need to address biases in AI training datasets.
Share
Read more