Poster
Stray Intrusive Outliers-Based Feature Selection on Intra-Class Asymmetric Instance Distribution or Multiple High-Density Clusters
Lixin Yuan · Yirui Wu · WENXIAO ZHANG · Minglei Yuan · Jun Liu
East Exhibition Hall A-B #E-1700
In many real-world datasets, such as images or medical records, data within the same class can have complex patterns, like uneven spreads or multiple dense clusters, making it hard to distinguish between classes. Some data points, called stray outliers, which look more like another class (e.g., a resort image mistaken for a school, or handwritten digits 4 and 9 appearing similar). Traditional feature selection (FS) methods treat all data points equally, ignoring these critical outliers. This paper introduces a new FS method, SIOFS, which focuses on these stray outliers that intrude other class bodies. SIOFS identifies the main characteristic of each class using a refined statistical approach, helping identify features that best separate classes. By testing on 16 diverse datasets, SIOFS outperformed 12 existing FS methods in accuracy and reliability. This advance is particularly useful for small or complex datasets where outliers and overlapping classes are common. This paper provides an interesting way to mine the patterns of tricky data, improving automated classification in fields like healthcare or image recognition.