Home AI EVMbench: An Open-Source Benchmark Evaluating AI Agents’ Proficiency in Mitigating Smart Contract...

EVMbench: An Open-Source Benchmark Evaluating AI Agents’ Proficiency in Mitigating Smart Contract Exploits

0
Open-source benchmark EVMbench tests how well AI agents handle smart contract exploits

Smart contract exploits are increasingly draining funds from blockchain projects despite advancements in auditing and bug bounty programs. The Ethereum Virtual Machine (EVM) environment, where smart contracts operate autonomously and manage substantial assets, presents challenges. To address this, EVMbench, an open-source benchmark developed by OpenAI and Paradigm, evaluates AI systems on smart contract security tasks, focusing on vulnerability detection, code patching, and exploitation in controlled settings. Built from 120 real-world vulnerabilities and 40 audits, EVMbench provides a dataset that reflects genuine development conditions, enhancing task complexity. With automated grading for exploit tasks in a sandboxed EVM, results reveal notable performance gaps across different models. While AI has improved exploit capabilities, patching remains a significant hurdle. EVMbench, available for free on GitHub, aims to facilitate consistent testing as AI model capabilities progress, crucial for safeguarding the $100B+ linked to smart contracts.

Source link

NO COMMENTS

Exit mobile version