Poster
in
Workshop: Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges and Futures
OAgents: An Empirical Study of Building Effective Agents
He Zhu · Tianrui Qin · Zhu · Heyuan Huang · Yeyi Guan · Jinxiang Xia · Hanhao Li · Yi Yao · Ningning Wang · Pai Liu · Tianhao Peng · Sunny Gui · Yuhui Liu · Robert Tang · Jian Yang · Ge Zhang · Xitong Gao · Yuchen Jiang · Changwang Zhang · Jun Wang · Jiaheng Liu · Wangchunshu Zhou
Recently, Agentic AI has become an increasingly popular field of research. However, we argue that current practices on agent research are far from standard, rigorous scientific research, which makes it hard to conduct apples-to-apples comparisons among and against existing methods. As a result, it is still obscure how different design choices in an agent framework impact its effectiveness, and measuring progress on agent research remains very hard. In this work, we conduct a systematic empirical study on the GAIA benchmark to investigate the impact of different popular design choices within key agent components in a fair and rigorous way. To begin with, we find that the lack of a standard evaluation protocol makes previous works, even the open-sourced ones, not reproducible, and the variance between different random runs is often non-negligible. Therefore, we first introduce a more robust evaluation protocol to make comparisons more stable. Our empirical study then unveils which components and designs, as well as correlations between these designs, are the keys for building effective agents, while others are not and redundant, despite seemingly making sense. With the insights gained from our empirical study, we build and open-source OAgents, a new foundation agent framework that achieves state-of-the-art performance among open-source projects, providing a good starting point and guidelines for building effective agents. More importantly, OAgents supports various design choices for agent components in a modularized way, facilitating future scientific research on Agentic AI.