Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distribution of Column Values.
Clifford A. Lynch:
VLDB 1988: 240-251@inproceedings{DBLP:conf/vldb/Lynch88,
author = {Clifford A. Lynch},
editor = {Fran\c{c}ois Bancilhon and
David J. DeWitt},
title = {Selectivity Estimation and Query Optimization in Large Databases
with Highly Skewed Distribution of Column Values},
booktitle = {Fourteenth International Conference on Very Large Data Bases,
August 29 - September 1, 1988, Los Angeles, California, USA,
publisher = {Morgan Kaufmann},
year = {1988},
isbn = {0-934613-75-3},
pages = {240-251},
ee = {db/conf/vldb/Lynch88.html},
crossref = {DBLP:conf/vldb/88},
bibsource = {DBLP,}
When column values in a large database follow highly skewed distributions (such as Zipf distributions, typically found in textual databases), qnery optimizers in current relational systems often fail to choose optimal query plans even for simple single-relation queries.
The major cause of these optimization failures is incorrect predicate selectivity estimation; the likelihood and cost of such errors are quantified.
A scheme for adding userdefined selectivity estimators to a relational DBMS is proposed.
The paper defines a series of new selectivity estimation methods that work well with highly skewed value distributions and then compares them to currently used methods such as uniform approximation and histograms.
Empirical data from a large bibliographic database is used throughout the analyses in this paper.
Printed Edition
François Bancilhon, David J. DeWitt (Eds.):
Fourteenth International Conference on Very Large Data Bases, August 29 - September 1, 1988, Los Angeles, California, USA, Proceedings.
Morgan Kaufmann 1988, ISBN 0-934613-75-3
