TOUCAN: The Largest Open Training Dataset for AI Agents

October 7, 2025

The MIT-IBM Watson AI Lab and the University of Washington introduced TOUCAN, the largest open dataset for training AI agents, featuring 1.5 million real tool interactions. This comprehensive dataset aims to enhance open models by demonstrating effective use of real-world tools, addressing a significant gap in existing datasets. Hosted on public MCP servers, TOUCAN documents detailed interactions involving over 2,000 tools across various domains, including finance and web development, capturing realistic errors and context dependencies through actual API executions. Unlike past datasets, which relied on simulations, TOUCAN offers improved tool execution insights. Testing showed significant enhancements in Qwen-2.5 models fine-tuned with TOUCAN, surpassing larger systems and yielding notable performance gains on multiple benchmarks, thereby advancing the capabilities of smaller models. Available on GitHub and Hugging Face, TOUCAN underscores the importance of high-quality training data for open-source AI development, while future plans include tool simulation and benchmarking enhancements.

Source link

{{post_title}}

TOUCAN: The Largest Open Training Dataset for AI Agents

NO COMMENTS

LEAVE A REPLY Cancel reply

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

RELATED ARTICLES

AI Revolutionizes Cybersecurity Access: Empowering Defenders with Advanced Tools

Adobe Unveils Firefly AI Assistant, Featuring Enhanced Generative AI and Creative...

IDC MarketScape: Vendor Assessment of Global AI-Driven Enterprise Asset Management Solutions...

NO COMMENTS

LEAVE A REPLY Cancel reply