Comprehensive Benchmarking Reveals That Large Language Models Fall Short of Traditional Tools in Diagnosing Rare Diseases

This study develops an evaluation strategy to assess the performance of Large Language Models (LLMs) compared to Exomiser, a specialized software for genetic differential diagnoses. LLMs provide free-text responses, while Exomiser outputs ranked lists encoded with OMIM and Orphanet codes. Focusing on phenotypic findings, we normalized diagnoses by considering clinically identical diseases equivalent for ranking purposes. We evaluated LLMs using 5,213 computational case reports formatted as phenopackets, comprising various genetic syndromes with structured Human Phenotype Ontology (HPO) terms. We automated diagnostic generation prompts via our software, phenopacket2prompt, accessible on GitHub. Exomiser generated diagnoses in phenotype-only mode. Our evaluation strategy utilized Mondo Disease Ontology terms to score LLM diagnoses against gold standards from curated publications, enhancing comparability. Performance was analyzed through clustering cases by organ specificity and the number of observed HPO terms, demonstrating LLM capabilities in differential diagnosis applications within human genetics.

Source link

News

Company:

Join our community of SUBSCRIBERS and be part of the conversation.

AI Revolutionizes Cybersecurity Access: Empowering Defenders with Advanced Tools

Adobe Unveils Firefly AI Assistant, Featuring Enhanced Generative AI and Creative Tools – Moneycontrol.com

IDC MarketScape: Vendor Assessment of Global AI-Driven Enterprise Asset Management Solutions for Asset-Intensive Industries (2025-2026)

Cathay FHC Integrates OpenAI into Group Operations – Embracing Data Science Innovation

SoftBank Issues New Bonds to Refinance Debt and Support OpenAI – Finimize

Cirrus CI is Closing: Transition to a Scalable, AI-Driven Solution

Sal Khan’s Vision: Rethinking the Impact of AI on Education

Harnessing AI in Intelligent Organizations: Exploring Jevons Paradox and Its Impact on the Workforce

Exploiting MCP Servers in AI Systems: The Risk of Tool Modifications Post-Approval

The AI Quandary: Navigating Challenges and Controversies

Comprehensive Benchmarking Reveals That Large Language Models Fall Short of Traditional Tools in Diagnosing Rare Diseases

Local News

AI Revolutionizes Cybersecurity Access: Empowering Defenders with Advanced Tools

Cirrus CI is Closing: Transition to a Scalable, AI-Driven Solution

Adobe Unveils Firefly AI Assistant, Featuring Enhanced Generative AI and Creative Tools – Moneycontrol.com

Sal Khan’s Vision: Rethinking the Impact of AI on Education

AI Revolutionizes Cybersecurity Access: Empowering Defenders with Advanced Tools

Cirrus CI is Closing: Transition to a Scalable, AI-Driven Solution

Adobe Unveils Firefly AI Assistant, Featuring Enhanced Generative AI and Creative Tools – Moneycontrol.com