Poster
Learning Imbalanced Data with Beneficial Label Noise
Guangzheng Hu · Feng Liu · Mingming Gong · Guanghui Wang · Liuhua Peng
East Exhibition Hall A-B #E-1505
Data imbalance is a common factor hindering classifier performance. Data-level approaches for imbalanced learning, such as resampling, often lead to information loss or generative errors. Building on theoretical studies of imbalance ratio in binary classification, it is found that adding suitable label noise can adjust biased decision boundaries and improve classifier performance. This paper proposes the Label-Noise-based Re-balancing (LNR) approach to solve imbalanced learning by employing a novel design of an asymmetric label noise model. In contrast to other data-level methods, LNR alleviates the issues of informative loss and generative errors and can be integrated seamlessly with any classifier or algorithm-level method. We validated the superiority of LNR on synthetic and real-world datasets. Our work opens a new avenue for imbalanced learning, highlighting the potential of beneficial label noise.
Machine learning struggles when one category (like fraudulent transactions) is vastly outnumbered by another (like normal transactions). Traditional fixes—deleting common examples or creating fake rare ones—often lose critical information or produce unrealistic data.We propose LNR, a simple but effective solution: we intentionally mislabel a small number of common examples as rare, labeling some suspicious "normal transactions" as "fraudulent" to stop the model from ignoring genuine fraud patterns. Unlike other methods, LNR preserves all original features, avoids information loss, and unrealistic samples.Tests across binary tabular data classification and multi-class image recognition tasks show LNR consistently improves rare-class recognition. Surprisingly, it proves that not all "label errors" are harmful—when applied strategically, they can enhance fairness. LNR's plug-and-play design makes it universally applicable to imbalance challenges in healthcare, finance, computer vision, and more, offering an easier way to help machine learning models see the "unseen."