ICML Poster Dimensionality Reduction on Complex Vector Spaces for Euclidean Distance with Dynamic Weights

Poster

Dimensionality Reduction on Complex Vector Spaces for Euclidean Distance with Dynamic Weights

Simone Moretti · Paolo Pellizzoni · Francesco Silvestri

West Exhibition Hall B2-B3 #W-1009

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract: The weighted Euclidean norm $||x||_w$ of a vector $x\in \mathbb{R}^d$ with weights $w\in \mathbb{R}^d$ is the Euclidean norm where the contribution of each dimension is scaled by a given weight. Approaches to dimensionality reduction that satisfy the Johnson–Lindenstrauss (JL) lemma can be easily adapted to the weighted Euclidean distance if weights are known and fixed: it suffices to scale each dimension of the input vectors according to the weights, and then apply any standard approach. However, this is not the case when weights are unknown during the dimensionality reduction or might dynamically change. In this paper, we address this issue by providing a linear function that maps vectors into a smaller complex vector space and allows to retrieve a JL-like estimate for the weighted Euclidean distance once weights are revealed. Our results are based on the decomposition of the complex dimensionality reduction into several Rademacher chaos random variables, which are studied using novel concentration inequalities for sums of independent Rademacher chaoses.

Lay Summary:

Machine learning models typically work with high-dimensional data, such as a document represented by thousands of words or a user profile described by hundreds of preferences. To make computations faster and more efficient, researchers use dimensionality reduction: a technique to compress data into a smaller number of dimensions while preserving important information, like the distances between data points.In many real-world applications, not all features (i.e., dimensions) are equally important. For instance, in recommendation systems, some words in a document carry more weight than others. If we know the importance of each feature beforehand, we can adjust for this during dimensionality reduction. But what happens if we only find out which features are important after the data has already been compressed?This paper addresses this challenge. It introduces a novel method that reduces the dimensions of data in a way that is agnostic to future feature importance, but still allows accurate distance measurements once those weights become known. To do this, the paper leverages, as a mathematical tool, complex numbers (i.e., numbers which include the square root of -1). The proposed method compresses the original data into a complex vector space using a linear function, making it efficient and applicable at scale. Once the feature importance weights are revealed, the method applies a special function to the compressed data to recover accurate estimates of weighted distances. This work opens the door to faster, more flexible machine learning systems, especially in settings where priorities change dynamically, like personalized recommendations or real-time data analysis.

Chat is not available.