ICML Poster Dynamic Similarity Graph Construction with Kernel Density Estimation

Poster

Dynamic Similarity Graph Construction with Kernel Density Estimation

Steinar Laenen · Peter Macgregor · He Sun

East Exhibition Hall A-B #E-2004

[ Abstract ] [ Lay Summary ] [ Project Page ]

[ Poster] [ OpenReview]

Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: In the kernel density estimation (KDE) problem, we are given a set $X$ of data points in $\mathbb{R}^d$, a kernel function $k: \mathbb{R}^d \times \mathbb{R}^d \rightarrow \mathbb{R}$, and a query point $\mathbf{q} \in \mathbb{R}^d$, and the objective is to quickly output an estimate of $\sum_{\mathbf{x} \in X} k(\mathbf{q}, \mathbf{x})$.In this paper, we consider $\textsf{KDE}$ in the dynamic setting, and introduce a data structure that efficiently maintains the _estimates_ for a set of query points as data points are added to $X$ over time.Based on this, we design a dynamic data structure that maintains a sparse approximation of the fully connected similarity graph on $X$, and develop a fast dynamic spectral clustering algorithm.We further evaluate the effectiveness of our algorithms on both synthetic and real-world datasets.

Lay Summary:

Imagine you have a constantly growing collection of items, like photos or social media posts, and you want to automatically group similar items together. A common way to achieve this is to first figure out how similar each pair of items is, which can be a lot of work if you have many items. This similarity information can be thought of as a network or "graph". The problem is, when new items arrive, recalculating all these similarities and updating the groups from scratch is very slow. A key part of this similarity calculation is quickly estimating how many items are "around" any given item, a process called Kernel Density Estimation (KDE).This research provides a new, much faster way to do this KDE even as new items are added, without redoing all the calculations each time. Building on this, the paper introduces a method to efficiently maintain a simplified version of the full similarity network that still captures the main groups or "clusters". This means we can update our understanding of the groups quickly when new data arrives.This work makes it practical to find and track evolving groups in large, constantly changing datasets. This is useful for scenarios like understanding how communities form in social networks or how topics trend online, by making the process faster and more scalable than previous methods while still finding accurate groupings.

Chat is not available.