Poster
in
Workshop: DataWorld: Unifying data curation frameworks across domains
Do Data Valuations Make Good Data Prices?
Dongyang Fan · Tyler Rotello · Sai Praneeth Reddy Karimireddy
Keywords: [ RAG ] [ Data Market ] [ Data Valuation ]
Abstract:
As large language models increasingly rely on external data sources, compensating data contributors has become a central concern. But how should these payments be devised? We revisit data valuations from a $\textit{market-design perspective}$ where payments serve to compensate data owners for the $\textit{private}$ heterogeneous costs they incur for collecting and sharing data. We show that popular valuation methods—such as Leave-One-Out and Data Shapley—make for poor payments. They fail to ensure truthful reporting of the costs, leading to $\textit{inefficient market}$ outcomes. To address this, we adapt well-established payment rules from mechanism design, namely Myerson and Vickrey-Clarke-Groves (VCG), to the data market setting. We show that the Myerson payment is the minimal truthful mechanism, optimal from the buyer’s perspective. Additionally, we identify a condition under which both data buyers and sellers are utility-satisfied, and the market achieves efficiency. Our findings highlight the importance of incorporating incentive compatibility into data valuation, paving the way for more robust and efficient data markets. Our data market framework is readily applicable to real-world scenarios. We illustrate this with an example of author compensation in a retrieval-augmented generation (RAG) market.
Chat is not available.