Semantic Compression and Pattern Extraction with Fascicles.
H. V. Jagadish, J. Madar, Raymond T. Ng:
Semantic Compression and Pattern Extraction with Fascicles.
VLDB 1999: 186-198@inproceedings{DBLP:conf/vldb/JagadishMN99,
author = {H. V. Jagadish and
J. Madar and
Raymond T. Ng},
editor = {Malcolm P. Atkinson and
Maria E. Orlowska and
Patrick Valduriez and
Stanley B. Zdonik and
Michael L. Brodie},
title = {Semantic Compression and Pattern Extraction with Fascicles},
booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
UK},
publisher = {Morgan Kaufmann},
year = {1999},
isbn = {1-55860-615-7},
pages = {186-198},
ee = {db/conf/vldb/JagadishMN99.html},
crossref = {DBLP:conf/vldb/99},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
Often many recoords in a database share similar
values for several attributes. If one is able to
identify and group together records that share
similar values for some - even if not all - attributes,
one can both obtain a more parsimonious representation
of the data, and gain useful insight into the data from
a mining perspective.
In this paper, we introduce the notion of fascicles.
A fascicle F(k,t) is a subset of records that have
k compact attributes. An attribute A of a
collection F of records is compact if the width of
the range of A-values (for numeric attributes) or
the number of distinct A-values (for categorial
attributes) of all the records in F does not exceed t.
We introduce and study two problems related to fascicles.
First, we consider how to find fascicles such that the total
storage of the relation is minimized.
Second, we study how best to extract fascicles whose sizes
exceed a given minimum threshold (i.e., support) and that
represent patterns of maximal quality, where quality is
measured by the pair (k,t). We develop algorithms
to attack both of the above problems.
We show that these two problems are very hard to solve
optimally. But we demonstrate empirically that good
solutions can be obtained using our algorithms.
Copyright © 1999 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, Michael L. Brodie (Eds.):
VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK.
Morgan Kaufmann 1999, ISBN 1-55860-615-7
Contents
References
- [1]
- Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan:
Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications.
SIGMOD Conference 1998: 94-105
- [2]
- Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami:
Mining Association Rules between Sets of Items in Large Databases.
SIGMOD Conference 1993: 207-216
- [3]
- Rakesh Agrawal, Ramakrishnan Srikant:
Fast Algorithms for Mining Association Rules in Large Databases.
VLDB 1994: 487-499
- [4]
- Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik:
The New Jersey Data Reduction Report.
IEEE Data Eng. Bull. 20(4): 3-45(1997)
- [5]
- Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, Shalom Tsur:
Dynamic Itemset Counting and Implication Rules for Market Basket Data.
SIGMOD Conference 1997: 255-264
- [6]
- Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu:
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.
KDD 1996: 226-231
- [7]
- Christos Faloutsos, King-Ip Lin:
FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets.
SIGMOD Conference 1995: 163-174
- [8]
- ...
- [9]
- ...
- [10]
- Sudipto Guha, Rajeev Rastogi, Kyuseok Shim:
CURE: An Efficient Clustering Algorithm for Large Databases.
SIGMOD Conference 1998: 73-84
- [11]
- Theodore Johnson, Tamraparni Dasu:
Comparing Massive High-Dimensional Data Sets.
KDD 1998: 229-233
- [12]
- ...
- [13]
- Mika Klemettinen, Heikki Mannila, Pirjo Ronkainen, Hannu Toivonen, A. Inkeri Verkamo:
Finding Interesting Rules from Large Sets of Discovered Association Rules.
CIKM 1994: 401-407
- [14]
- Flip Korn, H. V. Jagadish, Christos Faloutsos:
Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences.
SIGMOD Conference 1997: 289-300
- [15]
- ...
- [16]
- Heikki Mannila, Hannu Toivonen:
Levelwise Search and Borders of Theories in Knowledge Discovery.
Data Min. Knowl. Discov. 1(3): 241-258(1997)
- [17]
- Raymond T. Ng, Jiawei Han:
Efficient and Effective Clustering Methods for Spatial Data Mining.
VLDB 1994: 144-155
- [18]
- Wee Keong Ng, Chinya V. Ravishankar:
Block-Oriented Compression Techniques for Large Statistical Databases.
IEEE Trans. Knowl. Data Eng. 9(2): 314-328(1997)
- [19]
- Frank Olken, Doron Rotem:
Rearranging Data to Maximize the Efficiency of Compression.
PODS 1986: 78-90
- [20]
- Michael B. Smyth:
Power Domains.
J. Comput. Syst. Sci. 16(1): 23-36(1978)
- [21]
- Tian Zhang, Raghu Ramakrishnan, Miron Livny:
BIRCH: An Efficient Data Clustering Method for Very Large Databases.
SIGMOD Conference 1996: 103-114
Copyright © Fri Mar 12 17:22:57 2010
by Michael Ley (ley@uni-trier.de)