ICML Poster Correlation Clustering Beyond the Pivot Algorithm

Poster

Correlation Clustering Beyond the Pivot Algorithm

Soheil Behnezhad · Moses Charikar · Vincent Cohen-Addad · Alma Ghafari · Weiyun ma

East Exhibition Hall A-B #E-2007

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Thu 17 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: We study the classic correlation clustering problem. Given $n$ objects and a complete labeling of the object-pairs as either “similar” or “dissimilar”, the goal is to partition the objects intoarbitrarily many clusters while minimizing disagreements with the labels.A classic Pivot algorithm for this problem, due to [Ailon et al STOC'05], obtains a 3-approximation for this problem. Over the years, this algorithm has been successfully implemented in various settings. The downside of the Pivot algorithm is that the approximation analysis of 3 is tight for it. While better approximations have been achieved in some settings, these algorithms are often hard to implement in various settings. For example, [Behnezhad et al FOCS19] showed that the output of Pivot can be maintained in polylog time per update in a dynamic setting, a bound that was improved to constant by [Dalirrooyfard et al ICML'24]. But obtaining a better approximation remains open.In this paper, we present Modified Pivot, an algorithm that locally improves the output of Pivot. Our Modified Pivot algorithm can be implemented just as efficiently as Pivot in various settings. Our experiments show that the output of Modified Pivot on average makes less than 77\% of the mistakes made by Pivot. More surprisingly, we prove theoretically that Modified Pivot has approximation ratio $3-\epsilon_0$ for some absolute constant $\epsilon_0 > 0$. This, e.g., leads to a better than 3 approximation in the dynamic setting in polylog time, improving the 3-approximation obtained by [Behnezhad et al FOCS'19] and [Dalirrooyfard et al ICML'24].

Lay Summary:

We study the classic correlation clustering problem. Given objects and a complete labeling of the object-pairs as either “similar” or “dissimilar”, the goal is to partition the objects into arbitrarily many clusters while minimizing disagreements with the labels.A well‐known method (called Pivot) does a pretty good job: it picks one object at random, gathers all the objects similar to it into one group, removes them, and repeats until everything is grouped. This approach guarantees you won’t make more than three times as many mistakes in expectation. However, that “three times” bound is the best Pivot can do, and in many practical situations one would like to do a better without adding complexity.In our work, we introduce a change to Pivot that we call Modified Pivot. It still runs just as quickly and easily as the original method, but it makes noticeably fewer mistakes in practice. Our experiments show it typically cuts the number of errors compared to Pivot. Even more importantly, we prove there is a small but definite improvement in the worst‐case guarantee: Modified Pivot makes strictly fewer mistakes than three times the optimum, by some fixed amount. This small but real gain carries over to “dynamic” settings, where labels change over time, giving a more accurate grouping.

Chat is not available.