Fast Graph Vector Search via Hardware Acceleration and Delayed-Synchronization Traversal

Authors:

Wenqi Jiang, Hang Hu, Torsten Hoefler, Gustavo Alonso

Download PDF

Abstract

Vector search systems are indispensable in large language model (LLM) serving, search engines, and recommender systems, where minimizing online search latency is essential. Among various algorithms, graph-based vector search (GVS) is particularly popular due to its high search performance and quality. However, reducing GVS latency by intra-query parallelization remains challenging due to limitations imposed by both existing hardware architectures (CPUs and GPUs) and the inherent difficulty of parallelizing graph traversals. To efficiently serve low-latency GVS, we co-design hardware and algorithm by proposing Falcon and Delayed-Synchronization Traversal (DST). Falcon is a hardware GVS accelerator that implements efficient GVS operators, pipelines these operators, and reduces memory accesses by tracking search states with an onchip Bloom filter. DST is an efficient graph traversal algorithm that simultaneously improves search performance and quality by relaxing traversal orders to maximize accelerator utilization. Evaluation across various graphs and datasets shows that Falcon, prototyped on FPGAs, together with DST, achieves up to 4.3 × and 19.5 × lower latency and up to 8.0 × and 26.9 × improvements in energy efficiency over CPU- and GPU-based GVS systems.

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 18, No. 11

Fast Graph Vector Search via Hardware Acceleration and Delayed-Synchronization Traversal

Abstract