Workshop
The 1st Workshop on Vector Databases
Yusuke Matsui · Martin Aumüller · Magdalen Dobson Manohar · Harsha Vardhan Simhadri
West Meeting Room 208-209
Fri 18 Jul, 8:30 a.m. PDT
Vector databases (Vector DBs) are a foundational and critical application layer for injecting information into large language models (LLMs). Although different companies have proposed various vector databases, no academic workshop has previously existed to discuss these systems comprehensively. This workshop aims to foster discussions on vector databases from various perspectives, ranging from mathematical theories to implementation-level optimizations. Topics covered in the workshop include retrieval-augmented generation (RAG), algorithms and data structures for approximate nearest neighbor search (ANN), data management systems for handling vector data, query languages, and embedding models. Furthermore, the workshop will also function as a platform for companies and researchers working on vector databases to present technical details (white papers) and exchange ideas.
Schedule
Fri 8:30 a.m. - 8:40 a.m.
|
Opening Remarks
(
Opening
)
>
|
Yusuke Matsui 🔗 |
Fri 8:40 a.m. - 9:25 a.m.
|
Keynote 1: Vector search for machine learning and machine learning for vector search
(
Keynote
)
>
|
Matthijs Douze 🔗 |
Fri 9:30 a.m. - 9:50 a.m.
|
Oral 1: A Bi-metric Framework for Efficient Nearest Neighbor Search
(
Oral
)
>
|
Haike Xu · Piotr Indyk · Sandeep Silwal 🔗 |
Fri 9:50 a.m. - 10:10 a.m.
|
Oral 1: The RaBitQ Library
(
Oral
)
>
|
Jianyang Gao · Yutong Gou · Yuexuan Xu · Jifan Shi · Zhonghao Yang · Cheng Long 🔗 |
Fri 10:10 a.m. - 10:30 a.m.
|
Oral 1: IVF$^{2}$ Index: Fusing Classic and Spatial Inverted Indices for Fast Filtered ANNS
(
Oral
)
>
|
Ben Landrum · Magdalen Manohar · Mazin Karjikar · Laxman Dhulipala 🔗 |
Fri 10:30 a.m. - 12:00 p.m.
|
Poster Session 1
(
Poster Session
)
>
|
🔗 |
|
→ Scalable k-Means Clustering for Large k via Seeded Approximate Nearest-Neighbor Search ( Poster ) > link | Eliot Robson · Jack Spalding-Jamieson · Da Wei Zheng 🔗 |
|
→ IVF$^{2}$ Index: Fusing Classic and Spatial Inverted Indices for Fast Filtered ANNS ( Poster ) > link | Ben Landrum · Magdalen Manohar · Mazin Karjikar · Laxman Dhulipala 🔗 |
|
→ Unified Multimodal Interleaved Document Representation for Retrieval ( Poster ) > link | Jaewoo Lee · Joonho Ko · Jinheon Baek · Soyeong Jeong · Sung Ju Hwang 🔗 |
|
→ FrugalRAG: Learning to retrieve and reason for multi-hop QA ( Poster ) > link | Abhinav Java · Srivathsan Koundinyan · Nagarajan Natarajan · Amit Sharma 🔗 |
|
→ The RaBitQ Library ( Poster ) > link | Jianyang Gao · Yutong Gou · Yuexuan Xu · Jifan Shi · Zhonghao Yang · Cheng Long 🔗 |
|
→ Graph-Based Vector Search: An Experimental Evaluation of the State-of-the-Art ( Poster ) > link | Ilias Azizi · Karima Echihabi · Themis Palpanas · VASSILIS CHRISTOPHIDES 🔗 |
|
→ DistributedANN: Efficient Scaling of a Single DiskANN Graph Across Thousands of Computers ( Poster ) > link | Philip Adams · Menghao Li · Shi Zhang · Li Tan · Qi Chen · Mingqin Li · Zengzhong Li · Knut Risvik · Harsha Vardhan simhadri 🔗 |
|
→ Don’t Lag, RAG: Training-Free Adversarial Detection Using RAG ( Poster ) > link | Roie Kazoom · Raz Lapid · Moshe Sipper · Ofer Hadar 🔗 |
|
→ ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning ( Poster ) > link | Changtai Zhu · Siyin Wang · Ruijun Feng · Kai Song · Xipeng Qiu 🔗 |
|
→ A Bi-metric Framework for Efficient Nearest Neighbor Search ( Poster ) > link | Haike Xu · Piotr Indyk · Sandeep Silwal 🔗 |
Fri 12:00 p.m. - 1:00 p.m.
|
Lunch
|
🔗 |
Fri 1:00 p.m. - 1:45 p.m.
|
Keynote 2: GPU-oriented High-Performance Graph-based Approximate Nearest Neighbor Search
(
Keynote
)
>
|
Hiroyuki Ootomo 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Poster Session 2
(
Poster Session
)
>
|
🔗 |
|
→ $\alpha$-Reachable Graphs for Multi-vector Nearest Neighbor Search ( Poster ) > link | Siddharth Gollapudi · Ravishankar Krishnaswamy · Ben Landrum · Nikhil Rao · Kirankumar Shiragur · Sandeep Silwal · Harsh Wardhan 🔗 |
|
→ Scaling Laws for Nearest Neighbor Search ( Poster ) > link | Philip Sun · Felix Chern · Yaroslav Akhremtsev · Ruiqi Guo · David Simcha · Sanjiv Kumar 🔗 |
|
→ LEANN: A Low-Storage Vector Index for Personal Devices ( Poster ) > link |
13 presentersYichuan Wang · Shu Liu · Zhifei Li · Yongji Wu · Ziming Mao · Yilong Zhao · Xiao Yan · Zhiying Xu · Yang Zhou · Ion Stoica · Sewon Min · Matei Zaharia · Joseph E Gonzalez |
|
→ Enhancing Retrieval Systems with Inference-Time Logical Reasoning ( Poster ) > link | Felix Faltings · Wei Wei · Yujia Bao 🔗 |
|
→ Learning Filter-Aware Distance Metrics for Nearest Neighbor Search with Multiple Filters ( Poster ) > link | Gopal R Srinivasa · Ananya Sutradhar · Suryansh Gupta · Ravishankar Krishnaswamy · Kirankumar Shiragur 🔗 |
|
→ Vector Data Search with Sorting Transformation ( Poster ) > link | Hongzhi Wang · Tanveer Syeda-Mahmood 🔗 |
|
→ Entropy-Based Dynamic Hybrid Retrieval for Adaptive Query Weighting in RAG Pipelines ( Poster ) > link | John Perez · James Zhou · Nguyen Le · Alexander Menchtchikov · Ryan Lagasse 🔗 |
|
→ Accurate and Efficient Metadata Filtering in Pinecone’s Serverless Vector Database ( Poster ) > link | Amir Ingber · Edo Liberty 🔗 |
|
→ Rewrite-to-Rank: Optimizing Ad Visibility via Retrieval-Aware Text Rewriting ( Poster ) > link | Chloe Ho · Ishneet Singh · Diya Sharma · Tanvi Anumandla · Kevin Zhu · Michael Lu · Vasu Sharma 🔗 |
|
→ Down with the Hierarchy: The ‘H’ in HNSW Stands for “Hubs” ( Poster ) > link | Blaise Munyampirwa · Vihan Lakshman · Benjamin Coleman 🔗 |
|
→ Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs ( Poster ) > link | Ryan Synk · Monte Hoover · John Kirchenbauer · Neel Jain · Alex Stein · Manli Shu · Josue Melendez Sanchez · Ramani Duraiswami · Tom Goldstein 🔗 |
Fri 3:00 p.m. - 3:30 p.m.
|
Coffee Break
|
🔗 |
Fri 3:30 p.m. - 4:15 p.m.
|
Keynote 3: Vector Search for Large-Scale Genomic Discovery
(
Keynote
)
>
|
Prashant Pandey 🔗 |
Fri 4:15 p.m. - 4:35 p.m.
|
Oral 2: $\alpha$-Reachable Graphs for Multi-vector Nearest Neighbor Search
(
Oral
)
>
|
Siddharth Gollapudi · Ravishankar Krishnaswamy · Ben Landrum · Nikhil Rao · Kirankumar Shiragur · Sandeep Silwal · Harsh Wardhan 🔗 |
Fri 4:35 p.m. - 4:55 p.m.
|
Oral 2: Enhancing Retrieval Systems with Inference-Time Logical Reasoning
(
Oral
)
>
|
Felix Faltings · Wei Wei · Yujia Bao 🔗 |
Fri 4:55 p.m. - 5:15 p.m.
|
Oral 2: Down with the Hierarchy: The ‘H’ in HNSW Stands for “Hubs”
(
Oral
)
>
|
Blaise Munyampirwa · Vihan Lakshman · Benjamin Coleman 🔗 |
Fri 5:15 p.m. - 5:30 p.m.
|
Closing Remarks
(
Closing
)
>
|
Yusuke Matsui 🔗 |