ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Clustering Techniques for Minimizing External Path Length.

Ajit A. Diwan, Sanjeeva Rane, S. Seshadri, S. Sudarshan: Clustering Techniques for Minimizing External Path Length. VLDB 1996: 342-353
@inproceedings{DBLP:conf/vldb/DiwanRSS96,
  author    = {Ajit A. Diwan and
               Sanjeeva Rane and
               S. Seshadri and
               S. Sudarshan},
  editor    = {T. M. Vijayaraman and
               Alejandro P. Buchmann and
               C. Mohan and
               Nandlal L. Sarda},
  title     = {Clustering Techniques for Minimizing External Path Length},
  booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
               Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
  publisher = {Morgan Kaufmann},
  year      = {1996},
  isbn      = {1-55860-382-4},
  pages     = {342-353},
  ee        = {db/conf/vldb/DiwanRSS96.html},
  crossref  = {DBLP:conf/vldb/96},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

There are a variety of main-memory access structures, such as segment trees, and quad trees, whose properties, such as good worst-case behaviour, make them attractive for database applications. Unfortunately, the structures are typically `long and skinny', whereas disk data structures must be `short-and-fat' (that is, have a high fanout and low height) in order to minimize I/O.

We consider how to cluster the nodes (that is, map the nodes to disk pages) of mainmemory access structures such that that although a path may traverse many nodes, it only traverses a few disk pages. The number of disk pages traversed in a path is called the external path length. We address several version of the clustering problem. We present a clustering algorithm for tree structures that generates optimal worst-case external path length mappings; we also show how to make it dynamic, to support updates. We extend the algorithm to generate mappings that minimize the average weighted external path lengths. We also show that some other clustering problems, such as finding optimal external path lengths for DAG structures and minimizing weights for optimal height mappings, are NP-complete. We present efficient heurisitcs for these problem.

We present a performance study (using quad-trees on actual image data as an example) which shows that our algorithms perform well. Our algorithms can also be used for clustering complex objects in object-oriented databases.

Copyright © 1996 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.): VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India. Morgan Kaufmann 1996, ISBN 1-55860-382-4
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Electronic Edition

References

[BKKG88]
Jay Banerjee, Won Kim, Sung-Jo Kim, Jorge F. Garza: Clustering a DAG for CAD Databases. IEEE Trans. Software Eng. 14(11): 1684-1699(1988) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[BKSS90]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990: 322-331 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Ch91]
Jia-bing R. Cheng, Ali R. Hurson: Effective Clustering of Complex Objects in Object-Oriented Databases. SIGMOD Conference 1991: 22-31 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[CK89]
Ellis E. Chang, Randy H. Katz: Exploiting Inheritance and Structure Semantics for Effective Clustering and Buffering in an Object-Oriented DBMS. SIGMOD Conference 1989: 348-357 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[DSST86]
James R. Driscoll, Neil Sarnak, Daniel Dominic Sleator, Robert Endre Tarjan: Making Data Structures Persistent. STOC 1986: 109-121 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Gut84]
Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Jag90]
H. V. Jagadish: Linear Clustering of Objects with Multiple Atributes. SIGMOD Conference 1990: 332-342 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[KTF95]
Anil Kumar, Vassilis J. Tsotras, Christos Faloutsos: Access Methods for Bi-Temporal Databases. Temporal Databases 1995: 235-254 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[LS92]
Qing Li, John L. Smith: A Conceptual Model for Dynamic Clustering in Object Databases. VLDB 1992: 457-468 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[NGV93]
Mark H. Nodine, Michael T. Goodrich, Jeffrey Scott Vitter: Blocking for External Graph Searching. PODS 1993: 222-232 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Ore89]
Jack A. Orenstein: Redundancy in Spatial Databases. SIGMOD Conference 1989: 295-305 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Ore90]
Jack A. Orenstein: A Comparison of Spatial Query Processing Techniques for Native and Parameter Spaces. SIGMOD Conference 1990: 343-352 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[RS94]
Sridhar Ramaswamy, Sairam Subramanian: Path Caching: A Technique for Optimal External Searching. PODS 1994: 25-35 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Sam95]
Hanan Samet: Spatial Data Structures. Modern Database Systems 1995: 361-385 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SB93]
Clifford A. Shaffer, Patrick R. Brown: A Paging Scheme for Pointer-Based Quadtrees. SSD 1993: 89-104 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Sch77]
Mario Schkolnick: A Clustering Algorithm for Hierarchical Structures. ACM Trans. Database Syst. 2(1): 27-44(1977) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SRF87]
Timos K. Sellis, Nick Roussopoulos, Christos Faloutsos: The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. VLDB 1987: 507-518 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SSN90]
Clifford A. Shaffer, Hanan Samet, Randal C. Nelson: QUILT: A Geographic Information System based on Quadtrees. IJGIS 4(2): 103-131(1990) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[TN91]
Manolis M. Tsangaris, Jeffrey F. Naughton: A Stochastic Approach for Clustering in Object Bases. SIGMOD Conference 1991: 12-21 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[TN92]
Manolis M. Tsangaris, Jeffrey F. Naughton: On the Performance of Object Clustering Techniques. SIGMOD Conference 1992: 144-153 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Tue Mar 16 02:22:05 2010 by Michael Ley (ley@uni-trier.de)