OpenAI’s health lead, Singhal, highlights that the latest GPT-5 models improve information solicitation but reports that GPT-5.4 is less effective at context-seeking than GPT-5.2. Researcher Bean emphasizes the need for controlled human tests for health chatbots before public release, acknowledging the challenges of rapid AI advancements. His study utilized the outdated GPT-4o. Meanwhile, Google’s recent study on the AMIE medical chatbot, which hasn’t been publicly released, indicates its diagnoses are as accurate as human physicians without major safety issues. Despite promising results, Google is delaying AMIE’s launch, citing the need for further safety, equity, and fairness research, along with plans for a health platform featuring an AI assistant. Rodman questions the applicability of multiyear studies to chatbots, advocating for trusted third-party evaluations to ensure impartiality and minimize blind spots. Singhal supports external evaluations and praises frameworks like Stanford’s MedHELM, where OpenAI’s GPT-5 currently excels.
Source link
Share
Read more