Similarity Search in High Dimensions via Hashing.
Aristides Gionis, Piotr Indyk, Rajeev Motwani:
Similarity Search in High Dimensions via Hashing.
VLDB 1999: 518-529@inproceedings{DBLP:conf/vldb/GionisIM99,
author = {Aristides Gionis and
Piotr Indyk and
Rajeev Motwani},
editor = {Malcolm P. Atkinson and
Maria E. Orlowska and
Patrick Valduriez and
Stanley B. Zdonik and
Michael L. Brodie},
title = {Similarity Search in High Dimensions via Hashing},
booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
UK},
publisher = {Morgan Kaufmann},
year = {1999},
isbn = {1-55860-615-7},
pages = {518-529},
ee = {db/conf/vldb/GionisIM99.html},
crossref = {DBLP:conf/vldb/99},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
The nearest- or near-neighbor query problems arise in a large variety of database
applications, usually in the context of similarity searching. Of late, there has been
increasing interest in building search/index structures for performing similarity
search over high-dimensional data, e.g., image databases, document collections,
time-series databases, and genome databases. Unfortunately, all known techniques for
solving this problem fall prey to the "curse of dimensionality." That is, the data
structures scale poorly with data dimensionality; in fact, if the number of dimensions
exceeds 10 to 20, searching in k-d trees and related structures involves the inspection
of a large fraction of the database, thereby doing no better than brute-force linear
search. It has been suggested that since the selection of features and the choice of
a distance metric in typical applications is rather heuristic, determining an
approximate nearest neighbor should suffice for most practical purposes. In this paper,
we examine a novel scheme for approximate similarity search based on hashing. The
basic idea is to hash the points from the database so as to ensure that the probability
of collision is much higher for objects that are close to each other than for those
that are far apart. We provide experimental evidence that our method gives significant
improvement in running time over other methods for searching in high-dimensional
spaces based on hierarchical tree decomposition. Experimental results also indicate
that our scheme scales well even for a relatively large number of dimensions
(more than 50).
Copyright © 1999 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, Michael L. Brodie (Eds.):
VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK.
Morgan Kaufmann 1999, ISBN 1-55860-615-7
Contents
References
- [1]
- ...
- [2]
- Sunil Arya, David M. Mount, Nathan S. Netanyahu, Ruth Silverman, Angela Y. Wu:
An Optimal Algorithm for Approximate Nearest Neighbor Searching.
SODA 1994: 573-582
- [3]
- Jon Louis Bentley:
Multidimensional Binary Search Trees Used for Associative Searching.
Commun. ACM 18(9): 509-517(1975)
- [4]
- Stefan Berchtold, Daniel A. Keim:
High-Dimensional Index Structures, Database Support for Next Decade's Applications (Tutorial).
SIGMOD Conference 1998: 501
- [5]
- ...
- [6]
- Timothy M. Chan:
Approximate Nearest Neighbor Queries Revisited.
Symposium on Computational Geometry 1997: 352-358
- [7]
- R. Scott Cost, Steven Salzberg:
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features.
Machine Learning 10: 57-78(1993)
- [8]
- Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya:
Random Sampling for Histogram Construction: How much is enough?
SIGMOD Conference 1998: 436-447
- [9]
- ...
- [10]
- Paolo Ciaccia, Marco Patella, Pavel Zezula:
A Cost Model for Similarity Queries in Metric Spaces.
PODS 1998: 59-68
- [11]
- Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, Richard A. Harshman:
Indexing by Latent Semantic Analysis.
JASIS 41(6): 391-407(1990)
- [12]
- ...
- [13]
- ...
- [14]
- ...
- [15]
- Christos Faloutsos, Ron Barber, Myron Flickner, Jim Hafner, Wayne Niblack, Dragutin Petkovic, William Equitz:
Efficient and Effective Querying by Image Content.
J. Intell. Inf. Syst. 3(3/4): 231-262(1994)
- [16]
- ...
- [17]
- Myron Flickner, Harpreet S. Sawhney, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, David Steele, Peter Yanker:
Query by Image and Video Content: The QBIC System.
IEEE Computer 28(9): 23-32(1995)
- [18]
- Jerome H. Friedman, Jon Louis Bentley, Raphael A. Finkel:
An Algorithm for Finding Best Matches in Logarithmic Expected Time.
ACM Trans. Math. Softw. 3(3): 209-226(1977)
- [19]
- ...
- [20]
- ...
- [21]
- Trevor Hastie, Robert Tibshirani:
Discriminant Adaptive Nearest Neighbor Classification.
KDD 1995: 142-149
- [22]
- ...
- [23]
- ...
- [24]
- Piotr Indyk, Rajeev Motwani:
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality.
STOC 1998: 604-613
- [25]
- Kothuri Venkata Ravi Kanth, Divyakant Agrawal, Ambuj K. Singh:
Dimensionality Reduction for Similarity Searching in Dynamic Databases.
SIGMOD Conference 1998: 166-176
- [26]
- ...
- [27]
- ...
- [28]
- Norio Katayama, Shin'ichi Satoh:
The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries.
SIGMOD Conference 1997: 369-380
- [29]
- Nathan Linial, Eran London, Yuri Rabinovich:
The geometry of graphs and some of its algorithmic applications.
FOCS 1994: 577-591
- [30]
- ...
- [31]
- ...
- [32]
- ...
- [33]
- Gurmeet Singh Manku, Sridhar Rajagopalan, Bruce G. Lindsay:
Approximate Medians and other Quantiles in One Pass and with Limited Memory.
SIGMOD Conference 1998: 426-435
- [34]
- Yossi Matias, Jeffrey Scott Vitter, Min Wang:
Wavelet-Based Histograms for Selectivity Estimation.
SIGMOD Conference 1998: 448-459
- [35]
- Rajeev Motwani, Prabhakar Raghavan:
Randomized Algorithms.
Cambridge University Press 1995, ISBN 0-521-47465-5
- [36]
- ...
- [37]
- Alex Pentland, Rosalind W. Picard, Stan Sclaroff:
Photobook: Tools for Content-Based Manipulation of Image Databases.
Storage and Retrieval for Image and Video Databases (SPIE) 1994: 34-47
- [38]
- Gerard Salton, Michael McGill:
Introduction to Modern Information Retrieval.
McGraw-Hill Book Company 1984, ISBN 0-07-054484-0
- [39]
- Hanan Samet:
The Design and Analysis of Spatial Data Structures.
Addison-Wesley 1990
- [40]
- ...
- [41]
- Timos K. Sellis, Nick Roussopoulos, Christos Faloutsos:
Multidimensional Access Methods: Trees Have Grown Everywhere.
VLDB 1997: 13-14
- [42]
- ...
- [43]
- John R. Smith, Shih-Fu Chang:
Visually Searching the Web for Content.
IEEE MultiMedia 4(3): 12-20(1997)
- [44]
- ...
- [45]
- Roger Weber, Hans-Jörg Schek, Stephen Blott:
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces.
VLDB 1998: 194-205
Copyright © Tue Mar 16 02:22:08 2010
by Michael Ley (ley@uni-trier.de)