Poster
in
Workshop: The 1st Workshop on Vector Databases
Down with the Hierarchy: The ‘H’ in HNSW Stands for “Hubs”
Blaise Munyampirwa · Vihan Lakshman · Benjamin Coleman
Fri 18 Jul 8:30 a.m. PDT — 5:30 p.m. PDT
Driven by recent breakthrough advances in neuralrepresentation learning, approximate nearneighbor(ANN) search over vector embeddingshas emerged as a critical computational workload.With the introduction of the seminal HierarchicalNavigable Small World (HNSW) algorithm,graph-based indexes have established themselvesas the overwhelmingly dominant paradigm for efficientand scalable ANN search. As the namesuggests, HNSW searches a layered hierarchicalgraph to quickly identify neighborhoods of similarpoints to a given query vector. But is thishierarchy even necessary? A rigorous experimentalanalysis to answer this question would providevaluable insights into the nature of algorithm designfor ANN search and motivate directions forfuture work in this increasingly crucial domain.We conduct an extensive benchmarking study coveringmore large-scale datasets than prior investigationsof this question. We ultimately find thata flat navigable small world graph graph retainsall of the benefits of HNSW on high-dimensionaldatasets, with latency and recall performance essentiallyidentical to the original algorithm butwith less memory overhead. Furthermore, we go astep further and study why the hierarchy of HNSWprovides no benefit in high dimensions, hypothesizingthat navigable small world graphs containa well-connected, frequently traversed “highway”of hub nodes that maintain the same purportedfunction as the hierarchical layers. We presentcompelling empirical evidence that the Hub HighwayHypothesis holds for real datasets and investigatethe mechanisms by which the highwayforms. The implications of this hypothesis mayalso provide future research directions in developingenhancements to graph-based ANN search.