ACM SIGMOD Anthology VLDB dblp.uni-trier.de

A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces.

Roger Weber, Hans-Jörg Schek, Stephen Blott: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. VLDB 1998: 194-205
@inproceedings{DBLP:conf/vldb/WeberSB98,
  author    = {Roger Weber and
               Hans-J{\"o}rg Schek and
               Stephen Blott},
  editor    = {Ashish Gupta and
               Oded Shmueli and
               Jennifer Widom},
  title     = {A Quantitative Analysis and Performance Study for Similarity-Search
               Methods in High-Dimensional Spaces},
  booktitle = {VLDB'98, Proceedings of 24rd International Conference on Very
               Large Data Bases, August 24-27, 1998, New York City, New York,
               USA},
  publisher = {Morgan Kaufmann},
  year      = {1998},
  isbn      = {1-55860-566-5},
  pages     = {194-205},
  ee        = {db/conf/vldb/WeberSB98.html},
  crossref  = {DBLP:conf/vldb/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

For similarity search in high-dimensional vector spaces (or 'HDVSs'), researchers have proposed a number of new methods (or adaptations of existing methods) based, in the main, on data-space partitioning. However, the performance of these methods generally degrades as dimensionality increases. Although this phenomenon - known as the 'dimensional curse'- is well known, little or no quantitative analysis of the phenomenon is available. In this paper, we provide a detailed analysis of partitioning and clustering techniques for similarity search in HDVSs. We show formally that these methods exhibit linear complexity at high dimensionality, and that existing methods are outperformed on average by a simple sequential scan if the number of dimensions exceeds around 10. Consequently, we come up with an alternative organization based on approximations to make the unavoidable sequential scan as fast as possible. We describe a simple vector approximation scheme, called VA-file, and report on an experimental evaluation of this and of two tree-based index methods (an R*-tree and an X-tree).

Copyright © 1998 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ...

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Ashish Gupta, Oded Shmueli, Jennifer Widom (Eds.): VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA. Morgan Kaufmann 1998, ISBN 1-55860-566-5
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

References

[1]
Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[2]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990: 322-331 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[3]
Jon Louis Bentley, Jerome H. Friedman: Data Structures for Range Searching. ACM Comput. Surv. 11(4): 397-409(1979) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[4]
Stefan Berchtold, Christian Böhm, Bernhard Braunmüller, Daniel A. Keim, Hans-Peter Kriegel: Fast Parallel Similarity Search in Multimedia Databases. SIGMOD Conference 1997: 1-12 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[5]
Stefan Berchtold, Christian Böhm, Daniel A. Keim, Hans-Peter Kriegel: A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space. PODS 1997: 78-86 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[6]
Stefan Berchtold, Christian Böhm, Hans-Peter Kriegel: Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations. EDBT 1998: 216-230 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[7]
Stefan Berchtold, Daniel A. Keim, Hans-Peter Kriegel: The X-tree : An Index Structure for High-Dimensional Data. VLDB 1996: 28-39 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[8]
Thomas Brinkhoff, Hans-Peter Kriegel, Ralf Schneider: Comparison of Approximations of Complex Objects Used for Approximation-based Query Processing in Spatial Database Systems. ICDE 1993: 40-49 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[9]
Thomas Brinkhoff, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: Multi-Step Processing of Spatial Joins. SIGMOD Conference 1994: 197-208 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[10]
Paolo Ciaccia, Marco Patella, Pavel Zezula: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. VLDB 1997: 426-435 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[11]
...
[12]
...
[13]
...
[14]
Christos Faloutsos: Access Methods for Text. ACM Comput. Surv. 17(1): 49-74(1985) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[15]
...
[16]
Christos Faloutsos, Stavros Christodoulakis: Description and Performance Analysis of Signature File Methods for Office Filing. ACM Trans. Inf. Syst. 5(3): 237-257(1987) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[17]
Christos Faloutsos, Ibrahim Kamel: Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension. PODS 1994: 4-13 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[18]
Raphael A. Finkel, Jon Louis Bentley: Quad Trees: A Data Structure for Retrieval on Composite Keys. Acta Inf. 4: 1-9(1974) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[19]
Myron Flickner, Harpreet S. Sawhney, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, David Steele, Peter Yanker: Query by Image and Video Content: The QBIC System. IEEE Computer 28(9): 23-32(1995) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[20]
Jerome H. Friedman, Jon Louis Bentley, Raphael A. Finkel: An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Trans. Math. Softw. 3(3): 209-226(1977) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[21]
Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[22]
Joseph M. Hellerstein, Elias Koutsoupias, Christos H. Papadimitriou: On the Analysis of Indexing Schemes. PODS 1997: 249-256 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[23]
Gísli R. Hjaltason, Hanan Samet: Ranking in Spatial Databases. SSD 1995: 83-95 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[24]
Norio Katayama, Shin'ichi Satoh: The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries. SIGMOD Conference 1997: 369-380 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[25]
King-Ip Lin, H. V. Jagadish, Christos Faloutsos: The TV-Tree: An Index Structure for High-Dimensional Data. VLDB J. 3(4): 517-542(1994) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[26]
David B. Lomet, Betty Salzberg: The hB-Tree: A Multiattribute Indexing Method with Good Guaranteed Performance. ACM Trans. Database Syst. 15(4): 625-658(1990) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[27]
Jürg Nievergelt, Hans Hinterberger, Kenneth C. Sevcik: The Grid File: An Adaptable, Symmetric Multikey File Structure. ACM Trans. Database Syst. 9(1): 38-71(1984) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[28]
John T. Robinson: The K-D-B-Tree: A Search Structure For Large Multidimensional Dynamic Indexes. SIGMOD Conference 1981: 10-18 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[29]
Hanan Samet: The Design and Analysis of Spatial Data Structures. Addison-Wesley 1990
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[30]
Timos K. Sellis, Nick Roussopoulos, Christos Faloutsos: The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. VLDB 1987: 507-518 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[31]
Robert F. Sproull: Refinements to Nearest-Neighbor Searching in k-Dimensional Trees. Algorithmica 6(4): 579-589(1991) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[32]
Markus A. Stricker, Markus Orengo: Similarity of Color Images. Storage and Retrieval for Image and Video Databases (SPIE) 1995: 381-392 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[33]
Jeffrey D. Ullman: Principles of Database and Knowledge-Base Systems, Volume I. Computer Science Press 1988, ISBN 0-7167-8158-1
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[34]
...

Copyright © Tue Mar 16 02:22:07 2010 by Michael Ley (ley@uni-trier.de)