Poster
Improving Your Model Ranking on Chatbot Arena by Vote Rigging
Rui Min · Tianyu Pang · Chao Du · Qian Liu · Minhao Cheng · Min Lin
East Exhibition Hall A-B #E-2311
Chatbot Arena is a popular leaderboard for large models, where users vote for their preferred response from two randomly sampled anonymous models. With millions of crowdsourced votes, Chatbot Arena is often regarded as the community's definitive leaderboard. However, is Chatbot Arena truly reliable?We systematically investigate this question, and our findings highlight that Chatbot Arena can be manipulated even with hundreds of rigged votes. We propose two rigging strategies: the target-only strategy and the omnipresent strategy, both aimed at improving our target model’s ranking. Notably, our omnipresent rigging can effectively use any new votes for ranking promotion, even if the target model is not directly sampled for voting. This increases the efficiency of our vote rigging while also making it more difficult to detect against various defense mechanisms. To support future research on this problem, we provide a general-purpose rigging framework and have open-sourced all our implementations. We hope our paper will spark broader discussions and encourage the community to focus on developing more robust defense mechanisms to mitigate the rigging vulnerabilities.