Skip to yearly menu bar Skip to main content


Invited Talk
in
Workshop: DIG-BUGS: Data in Generative Models (The Bad, the Ugly, and the Greats)

On Specification Data

Serena Booth

[ ]
Sat 19 Jul 3:45 p.m. PDT — 4:15 p.m. PDT

Abstract:

Specification design is a critical question in AI because specifications are prone to overspecification, underspecification, and misspecification. We will discuss how experts write reward functions for reinforcement learning (RL) and how non-expert provide preferences for reinforcement learning from human feedback (RLHF). I will show evidence that experts are bad at writing reward functions: even in a trivial setting, experts write specifications that are overfit to a particular RL algorithm, and they often write erroneous specifications for agents that fail to encode their true intent. Next, I will show that the common approach to learning a reward function from non-experts in RLHF uses an inductive bias that fails to encode how humans express preferences, and that our proposed bias better encodes human preferences both theoretically and empirically.

Chat is not available.