Recovering Information from Summary Data.
Christos Faloutsos, H. V. Jagadish, Nikolaos Sidiropoulos:
Recovering Information from Summary Data.
VLDB 1997: 36-45@inproceedings{DBLP:conf/vldb/FaloutsosJS97,
author = {Christos Faloutsos and
H. V. Jagadish and
Nikolaos Sidiropoulos},
editor = {Matthias Jarke and
Michael J. Carey and
Klaus R. Dittrich and
Frederick H. Lochovsky and
Pericles Loucopoulos and
Manfred A. Jeusfeld},
title = {Recovering Information from Summary Data},
booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
Large Data Bases, August 25-29, 1997, Athens, Greece},
publisher = {Morgan Kaufmann},
year = {1997},
isbn = {1-55860-470-7},
pages = {36-45},
ee = {db/conf/vldb/FaloutsosJS97.html},
crossref = {DBLP:conf/vldb/97},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
Data is often stored in summarized form, as a histogram of aggregates
(COUNTs, SUMs, or AVeraGes) over specified ranges. We study how to estimate
the original detail data from the stored summary. We formulate this task
as an inverse problem, specifying a well-defined cost function
that has to be optimized under constraints. We show that our formulation
includes the uniformity and independence assumptions as a special case,
and that it can achieve better reconstruction results if we maximize the
smoothness as opposed to the uniformity.
In our experiments on real and synthetic datasets, the proposed method
almost consistently outperforms its competitor, improving the root-mean-square
error by up to 20 per cent for stock price data, and up to 90 per cent
for smoother data sets. Finally, we show how to apply this theory to a
variety of database problems that involve partial information, such as
OLAP, data warehousing and histograms in query optimization.
Copyright © 1997 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.):
VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece.
Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents
Electronic Edition
From CS Dept.,
University Trier (Germany)
References
- [1]
- ...
- [2]
- ...
- [3]
- Chung-Min Chen, Nick Roussopoulos:
Adaptive Selectivity Estimation Using Query Feedback.
SIGMOD Conference 1994: 161-172
- [4]
- Stavros Christodoulakis:
Implications of Certain Assumptions in Database Performance Evaluation.
ACM Trans. Database Syst. 9(2): 163-186(1984)
- [5]
- Divesh Srivastava, Shaul Dar, H. V. Jagadish, Alon Y. Levy:
Answering Queries with Aggregation Using Views.
VLDB 1996: 318-329
- [6]
- ...
- [7]
- ...
- [8]
- ...
- [9]
- Georg Gottlob, Roberto Zicari:
Closed World Databases Opened Through Null Values.
VLDB 1988: 50-61
- [10]
- Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh:
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total.
ICDE 1996: 152-159
- [11]
- Ashish Gupta, Venky Harinarayan, Dallan Quass:
Aggregate-Query Processing in Data Warehousing Environments.
VLDB 1995: 358-369
- [12]
- Ashish Gupta, Inderpal Singh Mumick, V. S. Subrahmanian:
Maintaining Views Incrementally.
SIGMOD Conference 1993: 157-166
- [13]
- Venky Harinarayan, Anand Rajaraman, Jeffrey D. Ullman:
Implementing Data Cubes Efficiently.
SIGMOD Conference 1996: 205-216
- [14]
- Tomasz Imielinski, Witold Lipski Jr.:
Incomplete Information in Relational Databases.
J. ACM 31(4): 761-791(1984)
- [15]
- Yannis E. Ioannidis, Viswanath Poosala:
Balancing Histogram Optimality and Practicality for Query Result Size Estimation.
SIGMOD Conference 1995: 233-244
- [16]
- H. V. Jagadish:
The INCINERATE Data Model.
ACM Trans. Database Syst. 20(1): 71-110(1995)
- [17]
- H. V. Jagadish, Inderpal Singh Mumick, Abraham Silberschatz:
View Maintenance Issues for the Chronicle Data Model.
PODS 1995: 113-124
- [18]
- David G. Luenberger:
Introduction to Linear and Nonlinear Programming.
Addison-Wesley 1973
- [19]
- Francesco M. Malvestuto:
A Universal-Scheme Approach to Statistical Databases Containing Homogeneous Summary Tables.
ACM Trans. Database Syst. 18(4): 678-708(1993)
- [20]
- M. Muralikrishna, David J. DeWitt:
Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries.
SIGMOD Conference 1988: 28-36
- [21]
- Wee Keong Ng, Chinya V. Ravishankar:
Information Synthesis in Statistical Databases.
CIKM 1995: 355-361
- [22]
- William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery:
Numerical Recipes in C, 2nd Edition.
Cambridge University Press 1992
Contents - [23]
- Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price:
Access Path Selection in a Relational Database Management System.
SIGMOD Conference 1979: 23-34
- [24]
- Chung-Dak Shum, Richard R. Muntz:
An Information-Theoretic Study on Aggregate Responses.
VLDB 1988: 479-490
- [25]
- Yannis Theodoridis, Timos K. Sellis:
A Model for the Prediction of R-tree Performance.
PODS 1996: 161-171
- [26]
- ...
- [27]
- Andreas S. Weigend, Neil A. Gerschenfeld:
Time Series Prediction: Forecasting the Future and Understanding the Past.
Addison-Wesley 1994, ISBN 0-201-62601-2
- [28]
- Jennifer Widom:
Research Problems in Data Warehousing.
CIKM 1995: 25-30
Copyright © Fri Mar 12 17:22:55 2010
by Michael Ley (ley@uni-trier.de)