Researchers from Imperial College London and Ant Group have developed a new framework, M-GRPO, for training artificial intelligence (AI) agents to collaboratively tackle complex tasks. Unlike traditional single-agent systems, which struggle with long decision chains and can propagate errors, M-GRPO employs a main agent to manage planning and several sub-agents to execute tasks. This vertical multi-agent structure reflects real-world operations, where AI must search, analyze, and retrieve information using various tools.
M-GRPO extends the previous GRPO method, using a decoupled training pipeline that evaluates agents based on their contributions to a shared buffer. This innovative approach allows for greater coordination between agents operating at different frequencies.
Testing against real-world benchmarks (WebWalkerQA, XBench DeepSearch, and GAIA), M-GRPO demonstrated superior performance and training stability compared to traditional models, optimizing sample efficiency and enhancing decision-making capabilities in multi-agent AI systems.
Source link
