ICML Poster The Noisy Laplacian: a Threshold Phenomenon for Non-Linear Dimension Reduction

Poster

The Noisy Laplacian: a Threshold Phenomenon for Non-Linear Dimension Reduction

Alex Kokot · Octavian-Vlad Murad · Marina Meila

East Exhibition Hall A-B #E-1200

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Wed 16 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: In this paper, we clarify the effect of noise on common spectrallymotivated algorithms such as Diffusion Maps (DM) for dimensionreduction. Empirically, these methods are much more robust to noisethan current work suggests. Specifically, existing consistency resultsrequire that either the noise amplitude or dimensionality must varywith the sample size $n$. We provide new theoretical resultsdemonstrating that low-frequency eigenpairs reliably capture thegeometry of the underlying manifold under a constant noise level, up to a dimension independent threshold $O(r^{-2})$, where $r$ is the noise amplitude. Our results rely on a decomposition of the manifold Laplacian in the Sasakimetric, a technique not used before in this area, to our knowledge. We experimentally validate our theoretical predictions. Additionally, we observesimilar robust behavior for other manifold learning algorithms which are not based on computing the Laplacian, namely LTSA and VAE.

Lay Summary:

Dimension reduction is the task of reducing the complexity of data while maintaining the integrity of essential information. Diffusion Maps is representative of one approach to this problem, where geometric information of a dataset is captured by an embedding related to a fundamental differential operator. The typical setting for these studies is for manifold valued data, data with a low-dimensional latent structure. It has been shown that this structure is essentially preserved by this procedure. In practice, even data with low-dimensional structure likely has some level of noise, and in this work we show what recovery is possible in this setting. Our key result is that, up to a noise dependent threshold, intrinsic manifold information can be approximated up to a small error using methods like Diffusion Maps. However, beyond this point, any larger embedding will capture mostly extraneous information due to the noise structure. We make this precise by leveraging a particular decomposition of noisy manifold data, the Sasaki metric, and we show that it allows for a neat disentanglement of noise and manifold components. These results can be related to real data by a perturbation argument.This culminates in a new perspective on why spectral embeddings such as Diffusion Maps are effective. While previous work emphasizes that recovery of manifold geometry is possible for truly low-dimensional data, we show that for very high dimensional data contaminated by noise, low-dimensional structure can still be revealed by these procedures. In this sense, the data is being implicitly denoised, providing valuable insight on its underlying structure. We observe a similar behavior for LTSA and VAEs, indicating a generality to this phenomenon.

Chat is not available.