Home AI Hacker News Introducing HN: Sales Agent Benchmark – Open Source SWE-Bench for Sales AI...

Introducing HN: Sales Agent Benchmark – Open Source SWE-Bench for Sales AI Agents

0

πŸš€ Transform Your Sales AI Evaluation! πŸš€

Introducing an open-source benchmark designed to assess LLM performance as sales agents. Inspired by the need for rigorous evaluations beyond polished demos, this tool sheds light on real-world deal data insights.

Key Features:

  • Two Evaluation Modes:

    • Summary Benchmark: Analyze 15 deals with structured checkpoints. Models score 68–81% here.
    • Artifact-Based Benchmark: Dive into multi-turn analysis with real transcripts and emails. Models score significantly lower at 26–38%.
  • Notable Findings:

    • Risk Identification Drops: Best model scores plummet from 8.0 to 2.3 when analyzing real data.
    • Hallucinated Stakeholders: Models invent names not present in artifacts.
    • Quality Structure: MEDDPICC scoring holds up at 7.5/10.

πŸ”— Join the movement! Register your API endpoint now to benchmark your agent here.

πŸ’¬ Have anonymized deal artifacts? Let’s collaborate!

Please share this to broaden the conversation on AI evaluation!

Source link

NO COMMENTS

Exit mobile version