Monday, December 1, 2025

Are AI Models Becoming Safer Over Time?

Are Frontier Models Really Getting Safer?

Exploring the evolving landscape of AI safety, our analysis dives into 18 months of Lamb-Bench safety scores for GPT and Claude models. Key insights reveal that while models have grown smarter, their safety isn’t guaranteed.

Key Findings:

  • Newer ≠ Safer: Safety scores for GPT and Claude models show notable regressions after peaks. GPT-4o leads in safety while Claude 3.5 Sonnet remains the safest in its lineage.
  • Volatility in Scores:
    • GPT Models: Fluctuate significantly (69 to 87).
    • Claude Models: Show a smoother, yet downward trend (83 to 76).

Critical Insights:

  • Context Matters: Choose models based on specific risk profiles.
  • Safety as Layer 0: Implement additional safeguard measures beyond vendor promises.

Conclusion

Building with AI? Treat safety as an empirical question. Analyze Lamb-Bench scores and customize guardrails to match your needs.

Join the conversation! Share your thoughts on model safety and let’s explore together!

Source link

Share

Read more

Local News