Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd Workshop on Models of Human Feedback for AI Alignment (MoFA)

Self-Concordant Preference Learning from Noisy Labels

Shiv Shankar · Madalina Fiterau


Abstract:

Preference learning is an integral part of the training process for a large language model (LLM) toserve user applications. While this alignment is usually done via offline learning from annotated feedback, there is inherent noise in obtaining such data, and most current methods are sensitive to such noise. In this work, we propose a novel approach to use such noisy labels based on concordant losses. Our proposed method is based on learning the optimal model under an adversarial labeller. Experiments show that our proposal is more effective than common algorithms for various levels of noise.

Chat is not available.