Poster
The Noisy Laplacian: a Threshold Phenomenon for Non-Linear Dimension Reduction
Alex Kokot · Octavian-Vlad Murad · Marina Meila
East Exhibition Hall A-B #E-1200
Dimension reduction is the task of reducing the complexity of data while maintaining the integrity of essential information. Diffusion Maps is representative of one approach to this problem, where geometric information of a dataset is captured by an embedding related to a fundamental differential operator. The typical setting for these studies is for manifold valued data, data with a low-dimensional latent structure. It has been shown that this structure is essentially preserved by this procedure. In practice, even data with low-dimensional structure likely has some level of noise, and in this work we show what recovery is possible in this setting. Our key result is that, up to a noise dependent threshold, intrinsic manifold information can be approximated up to a small error using methods like Diffusion Maps. However, beyond this point, any larger embedding will capture mostly extraneous information due to the noise structure. We make this precise by leveraging a particular decomposition of noisy manifold data, the Sasaki metric, and we show that it allows for a neat disentanglement of noise and manifold components. These results can be related to real data by a perturbation argument.This culminates in a new perspective on why spectral embeddings such as Diffusion Maps are effective. While previous work emphasizes that recovery of manifold geometry is possible for truly low-dimensional data, we show that for very high dimensional data contaminated by noise, low-dimensional structure can still be revealed by these procedures. In this sense, the data is being implicitly denoised, providing valuable insight on its underlying structure. We observe a similar behavior for LTSA and VAEs, indicating a generality to this phenomenon.