Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing.
Viswanath Poosala, Yannis E. Ioannidis:
Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing.
VLDB 1996: 448-459@inproceedings{DBLP:conf/vldb/PoosalaI96,
author = {Viswanath Poosala and
Yannis E. Ioannidis},
editor = {T. M. Vijayaraman and
Alejandro P. Buchmann and
C. Mohan and
Nandlal L. Sarda},
title = {Estimation of Query-Result Distribution and its Application in
Parallel-Join Load Balancing},
booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
publisher = {Morgan Kaufmann},
year = {1996},
isbn = {1-55860-382-4},
pages = {448-459},
ee = {db/conf/vldb/PoosalaI96.html},
crossref = {DBLP:conf/vldb/96},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
Many commercial database systems use some form of statistics, typically
histograms, to summarize the contents of relations and permit efficient
estimation of required quantities. While there has been considerable work
done on identifying good histograms for the estimation of query-result
sizes, little attention has been paid to the estimation of the data
distribution of the result, which is of importance in query optimization.
In this paper, we prove that the optimal histogram for estimating the size
of the result of a join operator is optimal for estimating its data
distribution as well. We also study the effectiveness of these optimal
histograms in the context of an important application that requires
estimates for the data distribution of a query result: load-balancing for
parallel Hybrid hash joins. We derive a cost formula to capture the
effect of data skew in both the input and output relations on the load and
use the optimal histograms to estimate this cost most accurately. We have
developed and implemented a load balancing algorithm using these
histograms on a simulator for the Gamma parallel database system. The
experiments establish the superiority of this approach compared to earlier
ones in handling all kinds and levels of skew while incurring negligible
overhead.
Copyright © 1996 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.):
VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India.
Morgan Kaufmann 1996, ISBN 1-55860-382-4
Contents
Electronic Edition
References
- [B+90]
- Haran Boral, William Alexander, Larry Clay, George P. Copeland, Scott Danforth, Michael J. Franklin, Brian E. Hart, Marc G. Smith, Patrick Valduriez:
Prototyping Bubba, A Highly Parallel Database System.
IEEE Trans. Knowl. Data Eng. 2(1): 4-24(1990)
- [Bro94]
- ...
- [Chr84]
- Stavros Christodoulakis:
Implications of Certain Assumptions in Database Performance Evaluation.
ACM Trans. Database Syst. 9(2): 163-186(1984)
- [DGG+86]
- David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna:
GAMMA - A High Performance Dataflow Database Machine.
VLDB 1986: 228-237
- [DGS+90]
- David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, Rick Rasmussen:
The Gamma Database Machine Project.
IEEE Trans. Knowl. Data Eng. 2(1): 44-62(1990)
- [DNSS92]
- David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider, S. Seshadri:
Practical Skew Handling in Parallel Joins.
VLDB 1992: 27-40
- [H+90]
- Laura M. Haas, Walter Chang, Guy M. Lohman, John McPherson, Paul F. Wilms, George Lapis, Bruce G. Lindsay, Hamid Pirahesh, Michael J. Carey, Eugene J. Shekita:
Starburst Mid-Flight: As the Dust Clears.
IEEE Trans. Knowl. Data Eng. 2(1): 143-160(1990)
- [HL91]
- Kien A. Hua, Chiang Lee:
Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning.
VLDB 1991: 525-535
- [HY95]
- Kien A. Hua, Wallapak Tavanapong, Honesty C. Young:
A Performance Evaluation of Load Balancing Techniques for Join Operations on Multicomputer Database Systems.
ICDE 1995: 44-51
- [IC91]
- Yannis E. Ioannidis, Stavros Christodoulakis:
On the Propagation of Errors in the Size of Join Results.
SIGMOD Conference 1991: 268-277
- [Ioa93]
- Yannis E. Ioannidis:
Universality of Serial Histograms.
VLDB 1993: 256-267
- [IP95a]
- Yannis E. Ioannidis, Viswanath Poosala:
Balancing Histogram Optimality and Practicality for Query Result Size Estimation.
SIGMOD Conference 1995: 233-244
- [IP95b]
- Yannis E. Ioannidis, Viswanath Poosala:
Histogram-Based Solutions to Diverse Database Estimation Problems.
IEEE Data Eng. Bull. 18(3): 10-18(1995)
- [KO90]
- Masaru Kitsuregawa, Yasushi Ogawa:
Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC).
VLDB 1990: 210-221
- [LY88]
- M. Seetha Lakshmi, Philip S. Yu:
Effect of Skew on Join Performance in Parallel Architectures.
DPDS 1988: 107-120
- [MD88]
- M. Muralikrishna, David J. DeWitt:
Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries.
SIGMOD Conference 1988: 28-36
- [PI96]
- ...
- [PIHS96]
- Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita:
Improved Histograms for Selectivity Estimation of Range Predicates.
SIGMOD Conference 1996: 294-305
- [PSC84]
- Gregory Piatetsky-Shapiro, Charles Connell:
Accurate Estimation of the Number of Tuples Satisfying a Condition.
SIGMOD Conference 1984: 256-276
- [SD89]
- Donovan A. Schneider, David J. DeWitt:
A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment.
SIGMOD Conference 1989: 110-121
- [Sha86]
- Leonard D. Shapiro:
Join Processing in Database Systems with Large Main Memories.
ACM Trans. Database Syst. 11(3): 239-264(1986)
- [SN93]
- Ambuj Shatdal, Jeffrey F. Naughton:
Using Shared Virtual Memory for Parallel Join Processing.
SIGMOD Conference 1993: 119-128
- [Sto86]
- Michael Stonebraker:
The Case for Shared Nothing.
IEEE Database Eng. Bull. 9(1): 4-9(1986)
- [Vit85]
- Jeffrey Scott Vitter:
Random Sampling with a Reservoir.
ACM Trans. Math. Softw. 11(1): 37-57(1985)
- [WDJ91]
- Christopher B. Walton, Alfred G. Dale, Roy M. Jenevein:
A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins.
VLDB 1991: 537-548
- [Zip49]
- George Kingsley Zipf:
Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology.
Addison-Wesley 1949
Copyright © Tue Mar 16 02:22:06 2010
by Michael Ley (ley@uni-trier.de)