Practical Skew Handling in Parallel Joins.
David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider, S. Seshadri:
Practical Skew Handling in Parallel Joins.
VLDB 1992: 27-40@inproceedings{DBLP:conf/vldb/DeWittNSS92,
author = {David J. DeWitt and
Jeffrey F. Naughton and
Donovan A. Schneider and
S. Seshadri},
editor = {Li-Yan Yuan},
title = {Practical Skew Handling in Parallel Joins},
booktitle = {18th International Conference on Very Large Data Bases, August
23-27, 1992, Vancouver, Canada, Proceedings},
publisher = {Morgan Kaufmann},
year = {1992},
isbn = {1-55860-151-1},
pages = {27-40},
ee = {db/conf/vldb/DeWittNSS92.html},
crossref = {DBLP:conf/vldb/92},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
We present an approach to dealing with skew in parallel joins in database systems. Our approach is easily implementable within current parallel DBMS, and performs well on skewed data without degrading the performance of the system on non-skewed data.
The main idea is to use multiple algorithms, each specialized for a different degree of skew, and to use a small sample of the relations being joined to determine which algorithm is appropriate.
We developed, implemented, and experimented with four new skew-handling parallel join algorithms; one, which we call virtual processor range partitioning, was the clear winner in high skew cases, while traditional hybrid hash joinwas the clear winner in lower skew or no skew cases.
We present experimental results from an implementation of all four algorithms on the Gamma parallel database machine.
To our knowledge, these are the first reported skew-handling numbers from an actual implementation.
Copyright © 1992 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Li-Yan Yuan (Ed.):
18th International Conference on Very Large Data Bases, August 23-27, 1992, Vancouver, Canada, Proceedings.
Morgan Kaufmann 1992, ISBN 1-55860-151-1
Contents
References
- [BFKS87]
- Chaitanya K. Baru, Ophir Frieder, Dilip D. Kandlur, Mark E. Segal:
Join on a Cube: Analysis, Simulation, and Implementation.
IWDM 1987: 61-74
- [BGMP79]
- Mike W. Blasgen, Jim Gray, Michael F. Mitoma, Thomas G. Price:
The Convoy Phenomenon.
Operating Systems Review 13(2): 20-25(1979)
- [Bra87]
- Kjell Bratbergsengen:
Algebra Operations on a Parallel Computer - Performance Evaluation.
IWDM 1987: 415-428
- [CDKK85]
- Hong-Tai Chou, David J. DeWitt, Randy H. Katz, Anthony C. Klug:
Design and Implementation of the Wisconsin Storage System.
Softw., Pract. Exper. 15(10): 943-962(1985)
- [Coc77]
- William G. Cochran:
Sampling Techniques, 3rd Edition.
John Wiley 1977, ISBN 0-471-16240-X
- [CW79]
- Larry Carter, Mark N. Wegman:
Universal Classes of Hash Functions.
J. Comput. Syst. Sci. 18(2): 143-154(1979)
- [DG85]
- David J. DeWitt, Robert H. Gerber:
Multiprocessor Hash-Based Join Algorithms.
VLDB 1985: 151-164
- [DG92]
- David J. DeWitt, Jim Gray:
Parallel Database Systems: The Future of High Performance Database Systems.
Commun. ACM 35(6): 85-98(1992)
- [DGG++86]
- David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna:
GAMMA - A High Performance Dataflow Database Machine.
VLDB 1986: 228-237
- [DGS88]
- David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider:
A Performance Analysis of the Gamma Database Machine.
SIGMOD Conference 1988: 350-360
- [DGS+90]
- David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, Rick Rasmussen:
The Gamma Database Machine Project.
IEEE Trans. Knowl. Data Eng. 2(1): 44-62(1990)
- [DNS91a]
- David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider:
An Evaluation of Non-Equijoin Algorithms.
VLDB 1991: 443-452
- [DNS91b]
- David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider:
Parallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting.
PDIS 1991: 280-291
- [ESW78]
- Robert S. Epstein, Michael Stonebraker, Eugene Wong:
Distributed Query Processing in a Relational Data Base System.
SIGMOD Conference 1978: 169-180
- [Gra69]
- Ronald L. Graham:
Bounds on Multiprocessing Timing Anomalies.
SIAM Journal of Applied Mathematics 17(2): 416-429(1969)
- [HL91]
- Kien A. Hua, Chiang Lee:
Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning.
VLDB 1991: 525-535
- [KO90]
- Masaru Kitsuregawa, Yasushi Ogawa:
Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC).
VLDB 1990: 210-221
- [KTMo83]
- Masaru Kitsuregawa, Hidehiko Tanaka, Tohru Moto-Oka:
Application of Hash to Data Base Machine and Its Architecture.
New Generation Comput. 1(1): 63-74(1983)
- [LY90]
- M. Seetha Lakshmi, Philip S. Yu:
Effectiveness of Parallel Joins.
IEEE Trans. Knowl. Data Eng. 2(4): 410-424(1990)
- [Omi91]
- Edward Omiecinski:
Performance Analysis of a Load Balancing Hash-Join Algorithm for a Shared Memory Multiprocessor.
VLDB 1991: 375-385
- [OR89]
- Frank Olken, Doron Rotem:
Random Sampling from B+ Trees.
VLDB 1989: 269-277
- [ORX90]
- Frank Olken, Doron Rotem, Ping Xu:
Random Sampling from Hash Files.
SIGMOD Conference 1990: 375-386
- [SD89]
- Donovan A. Schneider, David J. DeWitt:
A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment.
SIGMOD Conference 1989: 110-121
- [SN92]
- S. Seshadri, Jeffrey F. Naughton:
Sampling Issues in Parallel Database Systems.
EDBT 1992: 328-343
- [Sto86]
- Michael Stonebraker:
The Case for Shared Nothing.
IEEE Database Eng. Bull. 9(1): 4-9(1986)
- [WDJ91]
- Christopher B. Walton, Alfred G. Dale, Roy M. Jenevein:
A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins.
VLDB 1991: 537-548
- [WDYT90]
- Joel L. Wolf, Daniel M. Dias, Philip S. Yu, John Turek:
An Effective Algorithm for Parallelizing Hash Joins in the Presence of Data Skew.
ICDE 1991: 200-209
Copyright © Fri Mar 12 17:22:51 2010
by Michael Ley (ley@uni-trier.de)