Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning.
Kien A. Hua, Chiang Lee:
Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning.
VLDB 1991: 525-535@inproceedings{DBLP:conf/vldb/HuaL91,
author = {Kien A. Hua and
Chiang Lee},
editor = {Guy M. Lohman and
Am\'{\i}lcar Sernadas and
Rafael Camps},
title = {Handling Data Skew in Multiprocessor Database Computers Using
Partition Tuning},
booktitle = {17th International Conference on Very Large Data Bases, September
3-6, 1991, Barcelona, Catalonia, Spain, Proceedings},
publisher = {Morgan Kaufmann},
year = {1991},
isbn = {1-55860-150-3},
pages = {525-535},
ee = {db/conf/vldb/HuaL91.html},
crossref = {DBLP:conf/vldb/91},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
Shared nothing multiprocessor architecture is known to be more scalable to support very large databases.
Compared to other join strategies, a hash-based join algorithm is particularly efficient and easily parallelized for this computation model.
However, this hardware structure is very sensitive to the data skew problem.
Unless the parallel hash join algorithm includes some load balancing mechanism,skew effect can deteriorate the system performance severely.
In this paper, we propose two skew avoidance techniques and one skew resolution method.
In particular, three new parallel hash join algorithms are presented.
We developed an analytical model to study the effectiveness of these algorithms.
The performance study indicates that the proposed techniques offer substantial improvement over the conventional strategies in the presence of data skew.
It is also interesting to observe that the skew avoidance techniques provide join strategies that are robust against data skew; whereas the skew resolution method offers an adaptive join strategy that outperforms the conventional algorithms for any skew condition.
Copyright © 1991 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Guy M. Lohman, Amílcar Sernadas, Rafael Camps (Eds.):
17th International Conference on Very Large Data Bases, September 3-6, 1991, Barcelona, Catalonia, Spain, Proceedings.
Morgan Kaufmann 1991, ISBN 1-55860-150-3
References
- [1]
- ...
- [2]
- Haran Boral, William Alexander, Larry Clay, George P. Copeland, Scott Danforth, Michael J. Franklin, Brian E. Hart, Marc G. Smith, Patrick Valduriez:
Prototyping Bubba, A Highly Parallel Database System.
IEEE Trans. Knowl. Data Eng. 2(1): 4-24(1990)
- [3]
- David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, Rick Rasmussen:
The Gamma Database Machine Project.
IEEE Trans. Knowl. Data Eng. 2(1): 44-62(1990)
- [4]
- Susanne Englert, Jim Gray, Terrye Kocher, Praful Shah:
A Benchmark of NonStop SQL Release 2 Demonstrating Near-Linear Speedup and Scaleup on Large Databases.
SIGMETRICS 1990: 245-246
- [5]
- ...
- [6]
- ...
- [7]
- Michael Stonebraker:
The Case for Shared Nothing.
IEEE Database Eng. Bull. 9(1): 4-9(1986)
- [8]
- M. Seetha Lakshmi, Philip S. Yu:
Effect of Skew on Join Performance in Parallel Architectures.
DPDS 1988: 107-120
- [9]
- ...
- [10]
- Donovan A. Schneider, David J. DeWitt:
A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment.
SIGMOD Conference 1989: 110-121
- [11]
- Masaru Kitsuregawa, Yasushi Ogawa:
Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC).
VLDB 1990: 210-221
- [12]
- ...
- [13]
- Kien A. Hua, Chiang Lee:
An Adaptive Data Placement Scheme for Parallel Database Computer Systems.
VLDB 1990: 493-506
- [14]
- ...
- [15]
- M. Seetha Lakshmi, Philip S. Yu:
Limiting Factors of Join Performance on Parallel Processors.
ICDE 1989: 488-496
- [16]
- ...
- [17]
- ...
- [18]
- ...
- [19]
- ...
- [20]
- David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael Stonebraker, David A. Wood:
Implementation Techniques for Main Memory Database Systems.
SIGMOD Conference 1984: 1-8
Copyright © Tue Mar 16 02:22:02 2010
by Michael Ley (ley@uni-trier.de)