Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC).
Masaru Kitsuregawa, Yasushi Ogawa:
Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC).
VLDB 1990: 210-221@inproceedings{DBLP:conf/vldb/KitsuregawaO90,
author = {Masaru Kitsuregawa and
Yasushi Ogawa},
editor = {Dennis McLeod and
Ron Sacks-Davis and
Hans-J{\"o}rg Schek},
title = {Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash
Join Method for Data Skew in the Super Database Computer (SDC)},
booktitle = {16th International Conference on Very Large Data Bases, August
13-16, 1990, Brisbane, Queensland, Australia, Proceedings},
publisher = {Morgan Kaufmann},
year = {1990},
isbn = {1-55860-149-X},
pages = {210-221},
ee = {db/conf/vldb/KitsuregawaO90.html},
crossref = {DBLP:conf/vldb/90},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
The Super Database Computer (SDC) is a high- performance relational database server for a join- intensive environment under development at Universityof Tokyo.
SDC is designed to execute a join in a highly parallel way.
Compared to other join algorithms, a hash-based algorithm is quite efficient and easily parallelized, and has been employed by many database machines.
However, in the presence of data skew, it's hard to distribute load equally among processing modules (PMs) by statically allocating buckets to PMs, as in the conventional parallelizing strategy.
Thus, performance is severly degraded.
In this paper, we propose a new parallel hash join method, the bucket spreading strategy, which is robust for data skew.
During partitioning relations, each bucket is again divided into fragments of the same size and these fragments are temporarily placed on PMs one by one.
Then each bucket is dynamically allocated to a PM which actually carries out the join of the bucket, and all fragments of the bucket are collected inthe corresponding PM.
In this way, the bucket spreading strategy evenly distributes the load among the PMs and parallelism is always fully exploited.
The architecture of SDC is designed to support the bucket spreading strategy; a mechanism which distributes the buckets flatly among the PMs is embedded in the hardware of the interconnection network.
Simulation results confirm that the bucket spreading strategy is robust for data skew and attains very good scalability.
Copyright © 1990 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Dennis McLeod, Ron Sacks-Davis, Hans-Jörg Schek (Eds.):
16th International Conference on Very Large Data Bases, August 13-16, 1990, Brisbane, Queensland, Australia, Proceedings.
Morgan Kaufmann 1990, ISBN 1-55860-149-X
References
- [Bra84]
- Kjell Bratbergsengen:
Hashing Methods and Relational Algebra Operations.
VLDB 1984: 323-333
- [Bra89]
- Kjell Bratbergsengen, Torgrim Gjelsvik:
The Development of the CROSS8 and HC16-186 Parallel (Database) Computers.
IWDM 1989: 359-372
- [DeW84]
- David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael Stonebraker, David A. Wood:
Implementation Techniques for Main Memory Database Systems.
SIGMOD Conference 1984: 1-8
- [DeW85]
- David J. DeWitt, Robert H. Gerber:
Multiprocessor Hash-Based Join Algorithms.
VLDB 1985: 151-164
- [DeW86]
- David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna:
GAMMA - A High Performance Dataflow Database Machine.
VLDB 1986: 228-237
- [Ger86]
- ...
- [Hir90]
- ...
- [Kit83]
- Masaru Kitsuregawa, Hidehiko Tanaka, Tohru Moto-Oka:
Application of Hash to Data Base Machine and Its Architecture.
New Generation Comput. 1(1): 63-74(1983)
- [Kit84]
- ...
- [Kit87]
- Masaru Kitsuregawa, Miyuki Nakano, Lilian Harada, Mikio Takagi:
Functional Disk System for Relational Database.
ICDE 1987: 88-95
- [Kit87b]
- Masaru Kitsuregawa, Weikang Yang, T. Suzuki, Mikio Takagi:
Design and Implementation of High Speed Pipeline Merge Sorter with Run Length Tuning Mechanism.
IWDM 1987: 89-102
- [Kit89]
- Masaru Kitsuregawa, Miyuki Nakano, Mikio Takagi:
Query Execution for Large Relations on Functional Disk Systems.
ICDE 1989: 159-167
- [Kit89b]
- ...
- [Kit89c]
- Masaru Kitsuregawa, Masaya Nakayama, Mikio Takagi:
The Effect of Bucket Size Tuning in the Dynamic Hybrid GRACE Hash Join Method.
VLDB 1989: 257-266
- [Lak88]
- M. Seetha Lakshmi, Philip S. Yu:
Effect of Skew on Join Performance in Parallel Architectures.
DPDS 1988: 107-120
- [Lak89]
- M. Seetha Lakshmi, Philip S. Yu:
Limiting Factors of Join Performance on Parallel Processors.
ICDE 1989: 488-496
- [Law75]
- ...
- [Mit89]
- ...
- [Nak88]
- Masaya Nakayama, Masaru Kitsuregawa, Mikio Takagi:
Hash-Partitioned Join Method Using Dynamic Destaging Strategy.
VLDB 1988: 468-478
- [Nec88]
- ...
- [Omi89]
- Edward Omiecinski, Eileen Tien Lin:
Hash-Based and Index-Based Join Algorithms for Cube and Ring Connected Multicomputers.
IEEE Trans. Knowl. Data Eng. 1(3): 329-343(1989)
- [Rie78]
- ...
- [Sch89]
- Donovan A. Schneider, David J. DeWitt:
A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment.
SIGMOD Conference 1989: 110-121
- [Sha86]
- Leonard D. Shapiro:
Join Processing in Database Systems with Large Main Memories.
ACM Trans. Database Syst. 11(3): 239-264(1986)
- [Sto86]
- Michael Stonebraker:
The Case for Shared Nothing.
IEEE Database Eng. Bull. 9(1): 4-9(1986)
- [Ter88]
- ...
- [Tan88]
- The Tandem Performance Group:
A Benchmark of NonStop SQL on the Debit Credit Transaction (Invited Paper).
SIGMOD Conference 1988: 337-341
- [Tur87]
- ...
- [Val84]
- Patrick Valduriez, Georges Gardarin:
Join and Semijoin Algorithms for a Multiprocessor Database Machine.
ACM Trans. Database Syst. 9(1): 133-161(1984)
Copyright © Tue Mar 16 02:22:00 2010
by Michael Ley (ley@uni-trier.de)