@inproceedings{DBLP:conf/vldb/AgarwalADGNRS96, author = {Sameet Agarwal and Rakesh Agrawal and Prasad Deshpande and Ashish Gupta and Jeffrey F. Naughton and Raghu Ramakrishnan and Sunita Sarawagi}, editor = {T. M. Vijayaraman and Alejandro P. Buchmann and C. Mohan and Nandlal L. Sarda}, title = {On the Computation of Multidimensional Aggregates}, booktitle = {VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India}, publisher = {Morgan Kaufmann}, year = {1996}, isbn = {1-55860-382-4}, pages = {506-521}, ee = {db/conf/vldb/AgarwalADGNRS96.html}, crossref = {DBLP:conf/vldb/96}, bibsource = {DBLP, http://dblp.uni-trier.de} }

At the heart of all OLAP or multidimensional
data analysis is the ability to simultaneously
aggregate across many sets of dimensions.
Computing multidimensional aggregates is a performance
bottleneck for these applications.
This paper presents fast algorithms for computing a
collection of group-bys.
We focus on a special case of the aggregation problem -
computation of the **CUBE** operator.
The **CUBE** operator requires computing group-bys on all possible
combinations of a list of attributes, and is equivalent to the union
of a number of standard group-by operations.
We show how the structure of **CUBE** computations
can be viewed in terms of a hierarchy of group-by operations.
Our algorithms extend sort-based and hash-based grouping methods
with several optimizations, like combining common operations across
multiple group-bys, caching, and using pre-computed group-bys for
computing other group-bys.
Empirical evaluation shows that the resulting algorithms
give much better performance compared to straightforward methods.

This paper combines work done concurrently on computing the data cube by two different teams as reported in [SAG96] and [DANR96].

*Copyright © 1996 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.*

- Download PDF file (www.vldb.org, Darmstadt, Germany)
- Download PDF file (www.acm.org, New York, USA)

- Windows: Click the letter of your CD drive

A B C**D E**F G H I J K L M N O P Q R S T U V W X Y Z - Mac: Click here
- UNIX/LINUX: mount the CD and click on the path of your
*mount point*:

/Anthology/vldb8997 or /cdrom

- Windows: Click the letter of your CD drive

A B C**D E**F G H I J K L M N O P Q R S T U V W X Y Z - Mac: Click here
- UNIX/LINUX: mount the DVD and click on the path of your
*mount point*:

/Anthology/aDVD1 or /dvd

Contents

- From CS Dept., University Trier (Germany)

- [CM89]
- Meng Chang Chen, Lawrence McNamee: On the Data Model and Access Method of Summary Data Management. IEEE Trans. Knowl. Data Eng. 1(4): 519-529(1989)
- [GBLP96]
- Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. ICDE 1996: 152-159
- [GJ79]
- M. R. Garey, David S. Johnson:
Computers and Intractability: A Guide to the Theory of NP-Completeness.
W. H. Freeman 1979, ISBN 0-7167-1044-7

- [GLS94]
- Goetz Graefe, Ann Linville, Leonard D. Shapiro: Sort versus Hash Revisited. IEEE Trans. Knowl. Data Eng. 6(6): 934-944(1994)
- [Gra93]
- Goetz Graefe: Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 25(2): 73-170(1993)
- [HNSS95]
- Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Lynne Stokes: Sampling-Based Estimation of the Number of Distinct Values of an Attribute. VLDB 1995: 311-322
- [JS96]
- ...
- [PS82]
- Christos H. Papadimitriou, Kenneth Steiglitz:
Combinatorial Optimization: Algorithms and Complexity.
Prentice-Hall 1982, ISBN 0-13-152462-3

- [FELL57]
- ...
- [HRU96]
- Venky Harinarayan, Anand Rajaraman, Jeffrey D. Ullman: Implementing Data Cubes Efficiently. SIGMOD Conference 1996: 205-216
- [GHRU96]
- ...
- [EPST79]
- ...
- [SN95]
- Ambuj Shatdal, Jeffrey F. Naughton: Adaptive Parallel Aggregation Algorithms. SIGMOD Conference 1995: 104-114
- [SDNR96]
- Amit Shukla, Prasad Deshpande, Jeffrey F. Naughton, Karthikeyan Ramasamy: Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies. VLDB 1996: 522-531
- [SAG96]
- ...
- [DANR96]
- ...
- [Mic92]
- Zbigniew Michalewicz:
Statistical and Scientific Databases.
Ellis Horwood 1992

- [NC95]
- ...
- [CODD93]
- ...
- [FINK]
- ...
- [Sho82]
- Arie Shoshani: Statistical Databases: Characteristics, Problems, and some Solutions. VLDB 1982: 208-222
- [SR96]
- ...
- [STG95]
- ...
- [STL89]
- Jaideep Srivastava, Jack S. Eddy Tan, Vincent Y. Lum: TBSAM: An Access Method for Efficient Processing of Statistical Queries. IEEE Trans. Knowl. Data Eng. 1(4): 414-423(1989)
- [WELD95]
- ...