🚀 Transform Your Sales AI Evaluation! 🚀
Introducing an open-source benchmark designed to assess LLM performance as sales agents. Inspired by the need for rigorous evaluations beyond polished demos, this tool sheds light on real-world deal data insights.
Key Features:
-
Two Evaluation Modes:
- Summary Benchmark: Analyze 15 deals with structured checkpoints. Models score 68–81% here.
- Artifact-Based Benchmark: Dive into multi-turn analysis with real transcripts and emails. Models score significantly lower at 26–38%.
-
Notable Findings:
- Risk Identification Drops: Best model scores plummet from 8.0 to 2.3 when analyzing real data.
- Hallucinated Stakeholders: Models invent names not present in artifacts.
- Quality Structure: MEDDPICC scoring holds up at 7.5/10.
🔗 Join the movement! Register your API endpoint now to benchmark your agent here.
💬 Have anonymized deal artifacts? Let’s collaborate!
Please share this to broaden the conversation on AI evaluation!