go back

Volume 18, No. 10

Dynamic Range-Filtering Approximate Nearest Neighbor Search

Authors:
Zhencan Peng, Miao Qiao, Wenchao Zhou, Feifei Li, Dong Deng

Abstract

Range-filtering approximate nearest neighbor search ( RFANNS ) has gained significant attention recently. Consider a set D of highdimensional vectors, each associated with a numeric attribute value, e.g., price or timestamp. An RFANNS query consists of a query vector 𝑞 and a query range, reporting the approximate nearest neighbors of 𝑞 among data vectors whose attributes fall in the query range. Existing work on RFANNS only considers a static set D of data vectors while in many real-world scenarios, vectors arrive in the system in an arbitrary order. This paper studies dynamic RFANNS where both data vectors and queries arrive in a mixed stream: a query is posed on all the data vectors that have already arrived in the system. Existing work on RFANNS is difficult to be extended to the streaming setting as they construct the index in the order of the attribute values while the vectors arrive in the system in an arbitrary order. The main challenge to the dynamic RFANNS lies in the difference between the two orders. A naive approach to RFANNS maintains multiple hierarchical navigable small-world (HNSW) graphs, one for each of the 𝑂 (|D| 2 ) possible query ranges – too expensive to construct and maintain. To design an index structure that can integrate new data vectors with a low index size increment for efficient and effective query processing, we propose a structure called dynamic segment graph . It compresses the set of HNSW graphs of the naive approach, proven to be lossless under certain conditions, with only a linear to log |D| new edges in expectation when inserting a new vector. This dramatically reduces the index size while largely preserving the search performance. We further propose heuristics to significantly reduce the index cost of our dynamic segment graph in practice. Extensive experimental results show that our approach outperforms existing methods for static RFANNS and is scalable in handling dynamic RFANNS .

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy