Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 1st Workshop on Foundation Models for Structured Data (FMSD)

RECoRD: A Multi-Agent LLM Framework for Reverse Engineering Codebase to Relational Diagram

Yuan Xue · Xiaoyu Lu · Yunfei Bai · Hoiyi Ng · Yunan Liu


Abstract:

Understanding the behavior and logical structure of complex algorithms is a fundamental challenge in industrial systems. Recent advancements in large language models (LLMs) have demonstrated remarkable code understanding capabilities. However, their potential for reverse engineering algorithms into interpretable causal structures remains unexplored. In this work, we develop a multi-agent framework, RECoRD, that leverages LLMs to \textit{Reverse Engineering Codebase to Relational Diagram}. RECoRD uses reinforcement fine-tuning (RFT) to enhance the reasoning accuracy of the relation extraction agent. Fine-tuning on expert-curated causal graphs allows smaller specialized models to outperform larger foundation models on domain-specific tasks. The RFT-trained models significantly outperformed their foundation counterparts, improving F1 score from 0.69 to 0.97. RECoRD also exhibited strong generalization, with models fine-tuned on one use case improving performance on others. By automating the construction of interpretable causal models from code, RECoRD has wide-ranging applications in areas such as software debugging, operational optimization, and risk management.

Chat is not available.