ICML Playing the Data: Video Games as a Tool to Annotate and Train Models on Large Datasets

Poster
in
Workshop: 2nd Workshop on Models of Human Feedback for AI Alignment (MoFA)

Playing the Data: Video Games as a Tool to Annotate and Train Models on Large Datasets

Parham Ghasemloo Gheidari · Kai-Hsiang Chang · Roman Sarrazin-Gendron · Renata Mutalova · Alexander Butyaev · Attila Szantner · Jérôme Waldispühl

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Citizen science platforms can generate vast quantities of labeled data by engaging non-expert human contributors in solving tasks relevant to AI model development. In this work, we present insights from two deployed citizen science projects—Borderlands Science and Project Discovery—that have engaged millions of participants in annotating complex biological data.We discuss how human feedback collected via these platforms can be used to train or fine-tune AI models, with implications for learning from noisy demonstrations, preference aggregation, and biological discovery inspired by innate human intuition. We demonstrate how data from citizen science can be systematically used to train and evaluate machine learning models for biological sequence alignment and clustering, and propose a framework for aggregating and leveraging noisy human strategies at scale.

Chat is not available.

Poster in Workshop: 2nd Workshop on Models of Human Feedback for AI Alignment (MoFA)

Playing the Data: Video Games as a Tool to Annotate and Train Models on Large Datasets

Parham Ghasemloo Gheidari · Kai-Hsiang Chang · Roman Sarrazin-Gendron · Renata Mutalova · Alexander Butyaev · Attila Szantner · Jérôme Waldispühl

Poster
in
Workshop: 2nd Workshop on Models of Human Feedback for AI Alignment (MoFA)