ICML Self-Concordant Preference Learning from Noisy Labels

Poster
in
Workshop: 2nd Workshop on Models of Human Feedback for AI Alignment (MoFA)

Self-Concordant Preference Learning from Noisy Labels

Shiv Shankar · Madalina Fiterau

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Preference learning is an integral part of the training process for a large language model (LLM) toserve user applications. While this alignment is usually done via offline learning from annotated feedback, there is inherent noise in obtaining such data, and most current methods are sensitive to such noise. In this work, we propose a novel approach to use such noisy labels based on concordant losses. Our proposed method is based on learning the optimal model under an adversarial labeller. Experiments show that our proposal is more effective than common algorithms for various levels of noise.

Chat is not available.

Poster in Workshop: 2nd Workshop on Models of Human Feedback for AI Alignment (MoFA)

Self-Concordant Preference Learning from Noisy Labels

Shiv Shankar · Madalina Fiterau

Poster
in
Workshop: 2nd Workshop on Models of Human Feedback for AI Alignment (MoFA)