Spotlight Poster
All-Purpose Mean Estimation over R: Optimal Sub-Gaussianity with Outlier Robustness and Low Moments Performance
Jasper Lee · Walter McKelvie · Maoyuan Song · Paul Valiant
West Exhibition Hall B2-B3 #W-802
Thu 17 Jul 10 a.m. PDT — 11 a.m. PDT
Suppose we have a large population of numbers (say, the individual income of people in a country), and we're trying to estimate the population mean via sampling. The conventional method is to take a bunch of samples, and just compute the average of the samples in the hopes that it is a reasonable extrapolation. However, the sample average is very sensitive to extreme values, which might occur in our data set if we get unlucky in our sampling. Moreover, real-world data sampling can introduce errors, for example through mistakes in data entry or even through malicious meddling from bad actors. This paper mathematically proves that the recent Lee and Valiant mean estimator achieves essentially the smallest possible error in a wide variety of settings, including in badly-behaved populations where extreme values are relatively common, and also in settings where there is data corruption.