Assessing the Effectiveness of General-Purpose Large Language Models in Detecting Human Facial Emotions

The study, IRB-exempt from Beth Israel Deaconess Medical Center, utilized the NimStim dataset—comprised of 672 facial expression images from 43 multiracial actors—to evaluate facial emotion recognition via large language models (LLMs). The dataset features eight emotional expressions, showcasing a diverse representation of racial backgrounds, with psychometric evaluations demonstrating strong reliability and agreement among observers. Two LLM models, OpenAI GPT-4o and Google Gemini 2.0, processed the images in a standardized manner. Analytical methods included calculating Cohen’s kappa to assess model performance against established NimStim metrics, alongside confusion matrices for accuracy, precision, recall, and F1 scores. The study confirmed no public access to the NimStim dataset, ensuring its exclusivity and integrity in research. By comparing model outputs with the established kappa values from NimStim, the research benchmarked LLM performance, determining overlapping confidence intervals, which indicated comparable or different agreement levels across emotion categories.

Source link

News

Company:

Join our community of SUBSCRIBERS and be part of the conversation.

Introducing GLM-5: A Chinese-Made AI Comparable to Gemini 3.0 Pro and GPT-5.2, Capable of Streamlining Clerical Tasks and Offering Free Model Downloads

Gemini Bans Image Creation Featuring Disney Characters

AI Disruption: Revolutionizing Business Software and Transforming Work Dynamics

Chinese AI Apps Thrive Globally Despite Rising US-China Tensions, Says Alibaba Fund CEO

AI Agents at Risk of Data Breaches from Malicious Link Previews – SC Media

GustyCube/Membrane: An Advanced Memory Framework for Intelligent Systems — Featuring Typed, Revocable, and Decay-Resistant Memory with Competency Learning and Trust-Enhanced Retrieval.

Show HN: Quick 10-Minute AI Threat Modeling Using STRIDE and MAESTRO with Assumption-Driven Insights

Boost Your Productivity with CC from Google Labs

China’s AI Subsidy Battle Floods Retailers

Reflections on Tool Design and Artificial Intelligence

Assessing the Effectiveness of General-Purpose Large Language Models in Detecting Human Facial Emotions

Armis Launches AI-Powered Centrix Platform to Enhance Application Security

Hugging Face’s Monetization Chief Jeff Boudier Prioritizes Value Over Profit

Minimizing AI Agent Vulnerabilities: Implementing Process-Scoped Credentials

VCs Flock to Support Rival AI Companies OpenAI and Anthropic Amid Rising Industry Excitement – Bloomberg

MolmoSpaces: A Collaborative Ecosystem for Embodied AI Development

Local News

Introducing GLM-5: A Chinese-Made AI Comparable to Gemini 3.0 Pro and GPT-5.2, Capable of Streamlining Clerical Tasks and Offering Free Model Downloads

GustyCube/Membrane: An Advanced Memory Framework for Intelligent Systems — Featuring Typed, Revocable, and Decay-Resistant Memory with Competency Learning and Trust-Enhanced Retrieval.

Gemini Bans Image Creation Featuring Disney Characters

Show HN: Quick 10-Minute AI Threat Modeling Using STRIDE and MAESTRO with Assumption-Driven Insights

Introducing GLM-5: A Chinese-Made AI Comparable to Gemini 3.0 Pro and GPT-5.2, Capable of Streamlining Clerical Tasks and Offering Free Model Downloads

GustyCube/Membrane: An Advanced Memory Framework for Intelligent Systems — Featuring Typed, Revocable, and Decay-Resistant Memory with Competency Learning and Trust-Enhanced Retrieval.

Gemini Bans Image Creation Featuring Disney Characters