Poster
in
Workshop: Assessing World Models: Methods and Metrics for Evaluating Understanding
Understanding Large Language Models' Ability on Interdisciplinary Research
Yuanhao Shen · Daniel de Sousa · Ricardo de Andrade Nascimento · Ali Asad · Hongyu Guo · Xiaodan Zhu
Keywords: [ Dataset ] [ NLP for Science ] [ Interdisciplinary Research ] [ Large Language Models ] [ Benchmark ]
This work introduces IDRBench --- a pioneering benchmark featuring an expert-annotated dataset and a suite of tasks tailored to evaluate LLMs' capabilities in proposing valuable research ideas for \textbf{Interdisciplinary Research (IDR)}. To ensure a reliable evaluation, our dataset consists of scientific publications sourced from the ArXiv platform covering six distinct disciplines and is annotated by domain experts with diverse academic backgrounds.The design of evaluation tasks in IDRBench follows a progressive, real-world perspective, reflecting the natural stages of interdisciplinary research development, including 1) IDR Paper Identification, 2) IDR Idea Integration}, and 3) IDR Idea Recommendation. Using IDRBench, we construct baselines across 10 LLMs and observe that despite fostering some level of IDR awareness, LLMs still struggle to produce quality IDR ideas. These findings could not only spark new research directions, but also help to develop next-generation LLMs that excel in interdisciplinary research.