ICML Poster Sharp Optimality of Simple, Plug-in Estimation of the Fisher Information of a Smoothed Density

Poster

Sharp Optimality of Simple, Plug-in Estimation of the Fisher Information of a Smoothed Density

Subhodh Kotekal

West Exhibition Hall B2-B3 #W-812

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract: Given independent and identically distributed data from a compactly supported, $\alpha$-Hölder density $f$, we study estimation of the Fisher information of the Gaussian-smoothed density $f*\varphi_t$, where $\varphi_t$ is the density of $N(0, t)$. We derive the minimax rate including the sharp dependence on $t$ and show some simple, plug-in type estimators are optimal for $t > 0$, even though extra debiasing steps are widely employed in the literature to achieve the sharp rate in the unsmoothed ($t = 0$) case. Due to our result's sharp characterization of the scaling in $t$, plug-in estimators of the mutual information and entropy are shown to achieve the parametric rate by way of the I-MMSE and de Bruijn's identities.

Lay Summary:

It is often difficult to estimate even simple functions of an unknown, data-generating distribution given access only to samples, particularly when the distribution is quite complicated. It is typically suboptimal (in a statistical sense) to estimate by just computing the target function on the empirical distribution of the data points. Classical work has proposed many complicated techniques to do optimal estimation, but the methodologies appear brittle and too tailored. A quite different approach recently proposed in the literature is to add noise to the data, with the idea that the noised distribution will become simpler, looking more and more like the noise distribution. It may now be optimal to simply compute the target on the empirical distribution of the noised data points. In this paper, we sharply characterize the statistical limits of estimating a fundamental functional, the Fisher information, of a smooth density corrupted by additive Gaussian noise. Interestingly, there is a critical level of noise below which the problem is actually harder than if there were no noise. Nevertheless, we show that the simple plug-in strategy is indeed optimal. Using some fundamental identities, we describe how to estimate other information-theoretic quantities, like the entropy and the mutual information, from our results. Our results provide support for the noising approach.

Chat is not available.