Skip to yearly menu bar Skip to main content


Poster

An Instrumental Value for Data Production and its Application to Data Pricing

Rui Ai · Boxiang Lyu · Zhaoran Wang · Zhuoran Yang · Haifeng Xu

East Exhibition Hall A-B #E-1602
[ ] [ ]
Thu 17 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract: We develop a framework for capturing the instrumentalvalue of data production processes, whichaccounts for two key factors: (a) the context ofthe agent’s decision-making; (b) how much dataor information the buyer already possesses. We"micro-found" our data valuation function by establishingits connection to classic notions of signalsand information design in economics. Wheninstantiated in Bayesian linear regression, ourvalue naturally corresponds to information gain.Applying our proposed data value in Bayesian linearregression for monopoly pricing, we show thatif the seller can fully customize data production,she can extract the first-best revenue (i.e., full surplus)from any population of buyers, i.e., achievingfirst-degree price discrimination. If data canonly be constructed from an existing data pool,this limits the seller’s ability to customize, andachieving first-best revenue becomes generallyimpossible. However, we design a mechanismthat achieves seller revenue at most $\log(\kappa)$ lessthan the first-best, where $\kappa$ is the condition numberassociated with the data matrix. As a corollary,the seller extracts the first-best revenue in themulti-armed bandits special case.

Lay Summary:

How do we determine the value of data to an agent? It depends on the problem the agent is facing and the amount of information they already possess. From the perspective of rational agent decision-making, we propose an instrumental value framework that characterizes valid data valuation. Notably, we show that in the case of Bayesian linear regression, this value coincides with information gain. We then apply our instrumental value framework to a monopoly data pricing setting. We find that when the seller can perfectly customize data production, the buyer's surplus is zero, leading to severe market asymmetry and unfairness. In contrast, under limited customization, we derive an upper bound on the buyer's surplus. This prompts broader reflections on how to price such novel products in the data era and the resulting concerns about market fairness.

Chat is not available.