Histogram-Based Approximation of Set-Valued Query-Answers.

Yannis E. Ioannidis, Viswanath Poosala: Histogram-Based Approximation of Set-Valued Query-Answers. VLDB 1999: 174-185

@inproceedings{DBLP:conf/vldb/IoannidisP99,
  author    = {Yannis E. Ioannidis and
               Viswanath Poosala},
  editor    = {Malcolm P. Atkinson and
               Maria E. Orlowska and
               Patrick Valduriez and
               Stanley B. Zdonik and
               Michael L. Brodie},
  title     = {Histogram-Based Approximation of Set-Valued Query-Answers},
  booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
               Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
               UK},
  publisher = {Morgan Kaufmann},
  year      = {1999},
  isbn      = {1-55860-615-7},
  pages     = {174-185},
  ee        = {db/conf/vldb/IoannidisP99.html},
  crossref  = {DBLP:conf/vldb/99},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Answering queries approximately has recently been proposed as a way to reduce query response times in on-line decision support systems, when the precise answer is not necessary or early feedback is helpful. Most of the work in this area uses sampling-based techniques and handles aggregate queries, ignoring queries that return relations as answers. In this paper, we extend the scope of approximate query answering to general queries. We propose a novel and intuitive error measure for quantifying the error in an approximate query answer, which can be a multiset in general. We also study the use of histograms in approximate query answering as an alternative to sampling. In that direction, we develop a histogram algebra and demonstrate how complex SQL queries on a database may be translated into algebraic operations on the corresponding histograms. Finally, we present the results of an initial set of experiments where various types of histograms and sampling are compared with respect to their effectiveness in approximate query answering as captured by the introduced error measure. The results indicate that the MaxDiff(V,A) histograms provide quality approximations for both set-valued and aggregate queries, while sampling is competitive mainly for aggregate queries with no join operators.

Copyright © 1999 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

Download PDF file (www.vldb.org, Darmstadt, Germany)
Download PDF file (www.acm.org, New York, USA)

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Windows: Click the letter of your CD drive
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Mac: Click here
UNIX/LINUX: mount the DVD and click on the path of your mount point:
/Anthology/aDVD1 or /dvd

Printed Edition

Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, Michael L. Brodie (Eds.): VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK. Morgan Kaufmann 1999, ISBN 1-55860-615-7
Contents

References

[1]: Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy: Join Synopses for Approximate Query Answering. SIGMOD Conference 1999: 275-286
[2]: Peter Buneman, Susan B. Davidson, Aaron Watters: A Semantics for Complex Objects and Approximate Queries. PODS 1988: 305-314
[3]: Chung-Min Chen, Nick Roussopoulos: Adaptive Selectivity Estimation Using Query Feedback. SIGMOD Conference 1994: 161-172
[4]: Phillip B. Gibbons, Yossi Matias: New Sampling-Based Summary Statistics for Improving Approximate Query Answers. SIGMOD Conference 1998: 331-342
[5]: Phillip B. Gibbons, Yossi Matias, Viswanath Poosala: Fast Incremental Maintenance of Approximate Histograms. VLDB 1997: 466-475
[6]: Joseph M. Hellerstein, Peter J. Haas, Helen J. Wang: Online Aggregation. SIGMOD Conference 1997: 171-182
[7]: ...
[8]: ...
[9]: Robert Kooi: The Optimization of Queries in Relational Databases. Ph.D. thesis, Case Western Reserve University 1980
[10]: Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider: Practical Selectivity Estimation through Adaptive Sampling. SIGMOD Conference 1990: 1-11
[11]: Amihai Motro: Query Generalization: A Method for Interpreting Null Answers. Expert Database Workshop 1984: 597-616
[12]: Gultekin Özsoyoglu, Kaizheng Du, Sujatha Guru Swamy, Wen-Chi Hou: Processing Real-Time, Non-Aggregate Queries with Time-Constraints in CASE-DB. ICDE 1992: 410-417
[13]: Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276
[14]: Viswanath Poosala, Venkatesh Ganti: Fast Approximate Answers to Aggregate Queries on a Data Cube. SSDBM 1999: 24-33
[15]: Viswanath Poosala, Venkatesh Ganti: Fast Approximate Query Answering Using Precomputed Statistics. ICDE 1999: 252
[16]: Viswanath Poosala, Yannis E. Ioannidis: Selectivity Estimation Without the Attribute Value Independence Assumption. VLDB 1997: 486-495
[17]: Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305
[18]: ...
[19]: Jeffrey Scott Vitter, Min Wang, Balakrishna R. Iyer: Data Cube Approximation and Histograms via Wavelets. CIKM 1998: 96-104
[20]: Susan V. Vrbsky, Jane W.-S. Liu: APPROXIMATE - A Query Processor that Produces Monotonically Improving Approximate Answers. IEEE Trans. Knowl. Data Eng. 5(6): 1056-1068(1993)
[21]: George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949