ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Selectivity Estimation Without the Attribute Value Independence Assumption.

Viswanath Poosala, Yannis E. Ioannidis: Selectivity Estimation Without the Attribute Value Independence Assumption. VLDB 1997: 486-495
@inproceedings{DBLP:conf/vldb/PoosalaI97,
  author    = {Viswanath Poosala and
               Yannis E. Ioannidis},
  editor    = {Matthias Jarke and
               Michael J. Carey and
               Klaus R. Dittrich and
               Frederick H. Lochovsky and
               Pericles Loucopoulos and
               Manfred A. Jeusfeld},
  title     = {Selectivity Estimation Without the Attribute Value Independence
               Assumption},
  booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
               Large Data Bases, August 25-29, 1997, Athens, Greece},
  publisher = {Morgan Kaufmann},
  year      = {1997},
  isbn      = {1-55860-470-7},
  pages     = {486-495},
  ee        = {db/conf/vldb/PoosalaI97.html},
  crossref  = {DBLP:conf/vldb/97},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

The result size of a query that involves multiple attributes from the same relation depends on these attributes' joint data distribution, i.e., the frequencies of all combinations of attribute values. To simplify the estimation of that size, most commercial systems make the attribute value independence assumption and maintain statistics (typically histograms) on individual attributes only. In reality, this assumption is almost always wrong and the resulting estimations tend to be highly inaccurate. In this paper, we propose two main alternatives to effectively approximate (multi-dimensional) joint data distributions. (a) Using a multi-dimensional histogram, (b) Using the Singular Value Decomposition (SVD) technique from linear algebra. An extensive set of experiments demonstrates the advantages and disadvantages of the two approaches and the benefits of both compared to the independence assumption.

Copyright © 1997 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.): VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece. Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Electronic Edition

From CS Dept., University Trier (Germany)

References

[1]
...
[2]
Stavros Christodoulakis: Implications of Certain Assumptions in Database Performance Evaluation. ACM Trans. Database Syst. 9(2): 163-186(1984) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[3]
J. G. Griffiths: An Algorithm for Displaying a Class of Space-filling Curves. Softw., Pract. Exper. 16(5): 403-411(1986) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[4]
Yannis E. Ioannidis: Universality of Serial Histograms. VLDB 1993: 256-267 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[5]
Yannis E. Ioannidis, Stavros Christodoulakis: Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results. ACM Trans. Database Syst. 18(4): 709-748(1993) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[6]
Yannis E. Ioannidis, Viswanath Poosala: Balancing Histogram Optimality and Practicality for Query Result Size Estimation. SIGMOD Conference 1995: 233-244 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[7]
H. V. Jagadish: Linear Clustering of Objects with Multiple Atributes. SIGMOD Conference 1990: 332-342 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[8]
Ibrahim Kamel, Christos Faloutsos: Hilbert R-tree: An Improved R-tree using Fractals. VLDB 1994: 500-509 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[9]
Robert Kooi: The Optimization of Queries in Relational Databases. Ph.D. thesis, Case Western Reserve University 1980
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[10]
...
[11]
Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider: Practical Selectivity Estimation through Adaptive Sampling. SIGMOD Conference 1990: 1-11 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[12]
M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD Conference 1988: 28-36 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[13]
B. Muthuswamy, Larry Kerschberg: A Detailed Database Statistics Model for Realtional Query Optimization. ACM Annual Conference - The range of computing: mid-80's perspective 1985: 439-448 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[14]
Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[15]
...
[16]
Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[17]
William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery: Numerical Recipes in C, 2nd Edition. Cambridge University Press 1992
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[18]
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[19]
Wei Sun, Yibei Ling, Naphtali Rishe, Yi Deng: An Instant and Accurate Estimation Method for Joins and Selection in a Retrieval-Intensive Environment. SIGMOD Conference 1993: 79-88 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[20]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Tue Mar 16 02:22:07 2010 by Michael Ley (ley@uni-trier.de)