ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Modeling Skewed Distribution Using Multifractals and the `80-20' Law.

Christos Faloutsos, Yossi Matias, Abraham Silberschatz: Modeling Skewed Distribution Using Multifractals and the `80-20' Law. VLDB 1996: 307-317
@inproceedings{DBLP:conf/vldb/FaloutsosMS96,
  author    = {Christos Faloutsos and
               Yossi Matias and
               Abraham Silberschatz},
  editor    = {T. M. Vijayaraman and
               Alejandro P. Buchmann and
               C. Mohan and
               Nandlal L. Sarda},
  title     = {Modeling Skewed Distribution Using Multifractals and the `80-20'
               Law},
  booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
               Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
  publisher = {Morgan Kaufmann},
  year      = {1996},
  isbn      = {1-55860-382-4},
  pages     = {307-317},
  ee        = {db/conf/vldb/FaloutsosMS96.html},
  crossref  = {DBLP:conf/vldb/96},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

The focus of this paper is on the characterization of the skewness of an attribute-value distribution and on the extrapolations for interesting parameters. More specifically, given a vector with the highest h multiplicities mvec = (m1, m2, ..., mh), and some frequency moments Fq =\sum miq , (e.g., q=0, 2), we provide effective schemes for obtaining estimates about either its statistics or subsets/supersets of the relation.

We assume an 80/20 law, and specifically, a p/(1-p) law. This law gives a distribution which is commonly known in the fractals literature as `multifractal'. We show how to estimate p from the given information (first few multiplicities and a few moments), and present the results of our experimentations on real data. Our results demonstrate that schemes based on our multifractal assumption consistently outperforms those schemes based on the uniformity assumption, which are commonly used in current DBMSs. Moreover, our schemes can be used to provide estimates for supersets of a relation, which the uniformity assumption based schemes can not not provide at all.

Copyright © 1996 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.): VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India. Morgan Kaufmann 1996, ISBN 1-55860-382-4
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Electronic Edition

References

[1]
...
[2]
...
[3]
Alberto Belussi, Christos Faloutsos: Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension. VLDB 1995: 299-310 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[4]
Alfonso F. Cardenas: Analysis and Performance of Inverted Data Base Structures. Commun. ACM 18(5): 253-263(1975) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[5]
Christos Faloutsos, H. V. Jagadish: On B-Tree Indices for Skewed Distributions. VLDB 1992: 363-374 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[6]
...
[7]
Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Lynne Stokes: Sampling-Based Estimation of the Number of Distinct Values of an Attribute. VLDB 1995: 311-322 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[8]
Wen-Chi Hou, Gultekin Özsoyoglu: Statistical Estimators for Aggregate Relational Algebra Queries. ACM Trans. Database Syst. 16(4): 600-654(1991) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[9]
Yannis E. Ioannidis, Stavros Christodoulakis: Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results. ACM Trans. Database Syst. 18(4): 709-748(1993) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[10]
Yannis E. Ioannidis, Viswanath Poosala: Balancing Histogram Optimality and Practicality for Query Result Size Estimation. SIGMOD Conference 1995: 233-244 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[11]
...
[12]
M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD Conference 1988: 28-36 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[13]
...
[14]
...
[15]
...
[16]
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[17]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Fri Mar 12 17:22:54 2010 by Michael Ley (ley@uni-trier.de)