A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins.
Christopher B. Walton, Alfred G. Dale, Roy M. Jenevein:
A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins.
VLDB 1991: 537-548@inproceedings{DBLP:conf/vldb/WaltonDJ91,
author = {Christopher B. Walton and
Alfred G. Dale and
Roy M. Jenevein},
editor = {Guy M. Lohman and
Am\'{\i}lcar Sernadas and
Rafael Camps},
title = {A Taxonomy and Performance Model of Data Skew Effects in Parallel
Joins},
booktitle = {17th International Conference on Very Large Data Bases, September
3-6, 1991, Barcelona, Catalonia, Spain, Proceedings},
publisher = {Morgan Kaufmann},
year = {1991},
isbn = {1-55860-150-3},
pages = {537-548},
ee = {db/conf/vldb/WaltonDJ91.html},
crossref = {DBLP:conf/vldb/91},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
Recent work on parallel joins and data skew has concentrated on algorithm design without considering the causes and characteristics of data skew itself.
Existing analytic models of skew do not contain enough information to fully describe data skew in parallel implementations.
Because the assumptions made about the nature of skew vary between authors, it is almost impossible to make valid comparisons of parallel algorithms.
In this paper, a taxonomy of skew effects is developed, and a new performance model is introduced.
The model is used to compare the performance of two parallel join algorithms.
Copyright © 1991 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Guy M. Lohman, Amílcar Sernadas, Rafael Camps (Eds.):
17th International Conference on Very Large Data Bases, September 3-6, 1991, Barcelona, Catalonia, Spain, Proceedings.
Morgan Kaufmann 1991, ISBN 1-55860-150-3
References
- [Baru & Frieder 1989]
- Chaitanya K. Baru, Ophir Frieder:
Database Operations in a Cube-Connected Multicomputer System.
IEEE Trans. Computers 38(6): 920-927(1989)
- [Baru et al. 1987]
- Chaitanya K. Baru, Ophir Frieder, Dilip D. Kandlur, Mark E. Segal:
Join on a Cube: Analysis, Simulation, and Implementation.
IWDM 1987: 61-74
- [Boral 1988]
- Haran Boral:
Parallelism and Data Management.
JCDKB 1988: 362-373
- [Christodoulakis 1983]
- Stavros Christodoulakis:
Estimating record selectivities.
Inf. Syst. 8(2): 105-115(1983)
- [Copeland et al. 1988]
- George P. Copeland, William Alexander, Ellen E. Boughter, Tom W. Keller:
Data Placement In Bubba.
SIGMOD Conference 1988: 99-108
- [DeWitt 1986]
- David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna:
GAMMA - A High Performance Dataflow Database Machine.
VLDB 1986: 228-237
- [DeWitt et al. 1988]
- David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider:
A Performance Analysis of the Gamma Database Machine.
SIGMOD Conference 1988: 350-360
- [DeWitt 1990]
- David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, Rick Rasmussen:
The Gamma Database Machine Project.
IEEE Trans. Knowl. Data Eng. 2(1): 44-62(1990)
- [DeWitt et al. 1984]
- David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael Stonebraker, David A. Wood:
Implementation Techniques for Main Memory Database Systems.
SIGMOD Conference 1984: 1-8
- [Frieder 1990]
- Ophir Frieder:
Multiprocessor Algorithms for Relational-Database Operators on Hypercube Systems.
IEEE Computer 23(11): 13-28(1990)
- [Gerber 1986]
- ...
- [Gerber & DeWitt 1987]
- ...
- [Hu & Muntz 1989]
- ...
- [Kitsuregawa et al. 1983]
- ...
- [Lakshmi & Yu 1988]
- M. Seetha Lakshmi, Philip S. Yu:
Effect of Skew on Join Performance in Parallel Architectures.
DPDS 1988: 107-120
- [Lakshmi & Yu 1989]
- M. Seetha Lakshmi, Philip S. Yu:
Limiting Factors of Join Performance on Parallel Processors.
ICDE 1989: 488-496
- [Lynch 1988]
- Clifford A. Lynch:
Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distribution of Column Values.
VLDB 1988: 240-251
- [Montgomery et al. 1983]
- ...
- [Omiecinski & Liu 1989]
- ...
- [Richardson et al. 1987]
- James P. Richardson, Hongjun Lu, Krishna P. Mikkilineni:
Design and Evaluation of Parallel Pipelined Join Algorithms.
SIGMOD Conference 1987: 399-409
- [Schneider 1990]
- Donovan A. Schneider, David J. DeWitt:
Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines.
VLDB 1990: 469-480
- [Schneider & DeWitt 1989]
- Donovan A. Schneider, David J. DeWitt:
A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment.
SIGMOD Conference 1989: 110-121
- [Stonebraker 1986]
- Michael Stonebraker:
The Case for Shared Nothing.
IEEE Database Eng. Bull. 9(1): 4-9(1986)
- [Teradata 1983]
- ...
- [Walton et al. 1990]
- ...
- [Wolf et al. 1990]
- Joel L. Wolf, Daniel M. Dias, Philip S. Yu, John Turek:
An Effective Algorithm for Parallelizing Hash Joins in the Presence of Data Skew.
ICDE 1991: 200-209
- [Wolf et al. 1990]
- Joel L. Wolf, Daniel M. Dias, Philip S. Yu:
An Effective Algorithm for Parallelizing Sort Merge in the Presence of Data Skew.
DPDS 1990: 103-115
Copyright © Tue Mar 16 02:22:02 2010
by Michael Ley (ley@uni-trier.de)