A recent study highlights significant shortcomings in generative AI tools regarding the reliability of their claims. Researchers, including Pranav Narayanan Venkit from Salesforce AI Research, tested various AI search engines—such as OpenAI’s GPT-4.5, Microsoft’s Bing Chat, and You.com—by analyzing their responses to 303 queries. Alarmingly, around 33% of the claims lacked supporting evidence, with GPT-4.5 showing a 47% rate of unsupported statements. The evaluation, termed DeepTrace, assessed aspects like bias, relevance, and citation quality. While some responses covered contentious topics, others aimed to test expertise across various fields. Critics, including academics from Oxford and Zurich, question the study’s methodology, particularly the reliance on AI for assessment. Nonetheless, the findings underline the urgent need for improvements in accuracy, citation diversity, and user understanding of AI-generated information to enhance trustworthiness as these technologies proliferate. This underscores the importance of ensuring reliable AI content.
Source link

Share
Read more