Unlock Visual Reasoning with ClockBench 🌟
Introducing ClockBench — your go-to benchmark for visual reasoning in AI. This dataset features 10 clocks from a meticulously constructed collection of 180, ensuring robustness while safeguarding sensitive training data.
Key Features:
- Public Access: Dive into a transparent dataset that fosters research without compromising privacy.
- Hands-On Evaluation: Utilize two scripts designed for seamless model evaluation:
python3 clockbench_evaluate.py
: Evaluate your model with OpenRouter API using your API key.python3 clockbench_grade.py
: Easily grade results and obtain comprehensive JSON outputs.
- Open to Collaboration: Pull requests are welcomed, encouraging community-driven enhancements.
Whether you’re an AI developer, a researcher, or an enthusiast, ClockBench equips you with essential tools to elevate your projects.
🔗 Explore the future of AI visual reasoning! Visit ClockBench and share your thoughts in the comments! 💬 Let’s advance together!