Poster
in
Workshop: 1st Workshop on Foundation Models for Structured Data (FMSD)
Leveraging Linguistic Concepts for Structured Data Extraction
Ahmed Nabhan · Suleiman Khan
We introduce a framework for improving in-context learning capabilities of large language models (LLMs) based on basic principles of terminology theory. In structured data extraction applications, LLM prompts can achieve better performance when prompts include structured attribute definitions that are created according to specific patterns of term formation. In this context, operational definitions proved to be more effective for measurement-related structured attributes like weight and dimensions, while enumerative definitions work better for descriptive attributes such as color or size. We developed a method for embedding these optimal definition patterns in prompts, resulting in more effective LLM instructions for product data extraction tasks. The method was applied to the problem of automatically generating structured data extraction rules using linguistic patterns and few-shots examples, thus demonstrating the capability to handle large-scale datasets efficiently without laborious prompt authoring efforts. The framework was evaluated on both in-house and external datasets and it significantly improved product data extraction performance.