ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation.

Arnd Christian König, Gerhard Weikum: Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation. VLDB 1999: 423-434
@inproceedings{DBLP:conf/vldb/KonigW99,
  author    = {Arnd Christian K{\"o}nig and
               Gerhard Weikum},
  editor    = {Malcolm P. Atkinson and
               Maria E. Orlowska and
               Patrick Valduriez and
               Stanley B. Zdonik and
               Michael L. Brodie},
  title     = {Combining Histograms and Parametric Curve Fitting for Feedback-Driven
               Query Result-size Estimation},
  booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
               Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
               UK},
  publisher = {Morgan Kaufmann},
  year      = {1999},
  isbn      = {1-55860-615-7},
  pages     = {423-434},
  ee        = {db/conf/vldb/KonigW99.html},
  crossref  = {DBLP:conf/vldb/99},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

This paper aims to improve the accuracy of query result-size estimations in query optimizers by leveraging the dynamic feedback obtained from observations on the executed query workload. To this end, an approximate "synopsis" of data-value distributions is devised that combines histograms with parametric curve fitting, leading to a specific class of linear splines. The approach reconciles the benefits of histograms, simplicity and versatility, with those of parametric techniques especially the adaptivity to statistically biased and dynamically evolving query workloads.

The paper presents efficient algorithms for constructing the linear-spline synopsis for data-value distributions from a moving window of the most recent observations on (the most critical) query executions. The approach is worked out in full detail for capturing frequency as well as density distributions of data values, and it is shown how result size estimations are inferred for exact-match and range queries as well as projections and grouping. To a large extent, the developed methods can be generalized to multi-dimensional distributions, thus bearing the ability to capture correlations among attributes as well. Experimental studies underline the accuracy of the developed estimation methods, outperforming the best known classes of histograms.

Copyright © 1999 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, Michael L. Brodie (Eds.): VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK. Morgan Kaufmann 1999, ISBN 1-55860-615-7
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

References

[1]
Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[2]
Surajit Chaudhuri: An Overview of Query Optimization in Relational Systems. PODS 1998: 34-43 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[3]
Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya: Random Sampling for Histogram Construction: How much is enough? SIGMOD Conference 1998: 436-447 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[4]
Chung-Min Chen, Nick Roussopoulos: Adaptive Selectivity Estimation Using Query Feedback. SIGMOD Conference 1994: 161-172 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[5]
...
[6]
...
[7]
Sumit Ganguly, Phillip B. Gibbons, Yossi Matias, Abraham Silberschatz: Bifocal Sampling for Skew-Resistant Join Size Estimation. SIGMOD Conference 1996: 271-281 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[8]
Phillip B. Gibbons, Yossi Matias: New Sampling-Based Summary Statistics for Improving Approximate Query Answers. SIGMOD Conference 1998: 331-342 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[9]
...
[10]
Phillip B. Gibbons, Yossi Matias, Viswanath Poosala: Fast Incremental Maintenance of Approximate Histograms. VLDB 1997: 466-475 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[11]
...
[12]
Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Arun N. Swami: Selectivity and Cost Estimation for Joins Based on Random Sampling. J. Comput. Syst. Sci. 52(3): 550-569(1996) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[13]
Yannis E. Ioannidis: Universality of Serial Histograms. VLDB 1993: 256-267 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[14]
Yannis E. Ioannidis, Stavros Christodoulakis: On the Propagation of Errors in the Size of Join Results. SIGMOD Conference 1991: 268-277 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[15]
Yannis E. Ioannidis, Stavros Christodoulakis: Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results. ACM Trans. Database Syst. 18(4): 709-748(1993) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[16]
Yannis E. Ioannidis, Viswanath Poosala: Balancing Histogram Optimality and Practicality for Query Result Size Estimation. SIGMOD Conference 1995: 233-244 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[17]
H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Viswanath Poosala, Kenneth C. Sevcik, Torsten Suel: Optimal Histograms with Quality Guarantees. VLDB 1998: 275-286 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[18]
Navin Kabra, David J. DeWitt: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans. SIGMOD Conference 1998: 106-117 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[19]
Yibei Ling, Wei Sun: An Evaluation of Sampling-Based Size Estimation Methods for Selections in Database Systems. ICDE 1995: 532-539 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[20]
Michael V. Mannino, Paicheng Chu, Thomas Sager: Statistical Profile Estimation in Database Systems. ACM Comput. Surv. 20(3): 191-221(1988) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[21]
Yossi Matias, Jeffrey Scott Vitter, Min Wang: Wavelet-Based Histograms for Selectivity Estimation. SIGMOD Conference 1998: 448-459 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[22]
...
[23]
...
[24]
Viswanath Poosala: Histogram-Based Estimation Techniques in Database Systems. Ph.D. thesis, Univ. of Wisconsin-Madison 1997
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[25]
Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[26]
...
[27]
...
[28]
Wei Sun, Yibei Ling, Naphtali Rishe, Yi Deng: An Instant and Accurate Estimation Method for Joins and Selection in a Retrieval-Intensive Environment. SIGMOD Conference 1993: 79-88 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[29]
...

Copyright © Fri Mar 12 17:22:57 2010 by Michael Ley (ley@uni-trier.de)