Clustering Categorical Data: An Approach Based on Dynamical Systems.
David Gibson, Jon M. Kleinberg, Prabhakar Raghavan:
Clustering Categorical Data: An Approach Based on Dynamical Systems.
VLDB 1998: 311-322@inproceedings{DBLP:conf/vldb/GibsonKR98,
author = {David Gibson and
Jon M. Kleinberg and
Prabhakar Raghavan},
editor = {Ashish Gupta and
Oded Shmueli and
Jennifer Widom},
title = {Clustering Categorical Data: An Approach Based on Dynamical Systems},
booktitle = {VLDB'98, Proceedings of 24rd International Conference on Very
Large Data Bases, August 24-27, 1998, New York City, New York,
USA},
publisher = {Morgan Kaufmann},
year = {1998},
isbn = {1-55860-566-5},
pages = {311-322},
ee = {db/conf/vldb/GibsonKR98.html},
crossref = {DBLP:conf/vldb/98},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data.
By "categorical data," we mean tables with fields that cannot be naturallyordered by a metric - e.g., the names of producers of automobiles, or the names of products offered by a manufacturer.
Our approach is based on an iterative method for assigning and propagatingweights on the categorical values in a table; this facilitates a type of similarity measure arising from the co-occurrence of values in the dataset.
Our techniques can be studied analytically in terms of certain types of non-linear dynamical systems.
We discuss experiments on a variety of tables of synthetic and real data; we find that our iterative methods converge quickly to prominently correlated values of various categorical fields.
Copyright © 1998 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Ashish Gupta, Oded Shmueli, Jennifer Widom (Eds.):
VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA.
Morgan Kaufmann 1998, ISBN 1-55860-566-5
Contents
References
- [1]
- Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, A. Inkeri Verkamo:
Fast Discovery of Association Rules.
Advances in Knowledge Discovery and Data Mining 1996: 307-328
- [2]
- Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami:
Mining Association Rules between Sets of Items in Large Databases.
SIGMOD Conference 1993: 207-216
- [3]
- ...
- [4]
- ...
- [5]
- ...
- [6]
- Avrim Blum, Joel Spencer:
Coloring Random and Semi-Random k-Colorable Graphs.
J. Algorithms 19(2): 204-234(1995)
- [7]
- Ravi B. Boppana:
Eigenvalues and Graph Bisection: An Average-Case Analysis (Extended Abstract).
FOCS 1987: 280-285
- [8]
- Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, Shalom Tsur:
Dynamic Itemset Counting and Implication Rules for Market Basket Data.
SIGMOD Conference 1997: 255-264
- [9]
- ...
- [10]
- ...
- [11]
- Tzi-cker Chiueh:
Content-Based Image Indexing.
VLDB 1994: 582-593
- [12]
- ...
- [13]
- Gautam Das, Heikki Mannila, Pirjo Ronkainen:
Similarity of Attributes by External Probes.
KDD 1998: 23-29
- [14]
- Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, Richard A. Harshman:
Indexing by Latent Semantic Analysis.
JASIS 41(6): 391-407(1990)
- [15]
- ...
- [16]
- ...
- [17]
- ...
- [18]
- ...
- [19]
- ...
- [20]
- Myron Flickner, Harpreet S. Sawhney, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, David Steele, Peter Yanker:
Query by Image and Video Content: The QBIC System.
IEEE Computer 28(9): 23-32(1995)
- [21]
- M. R. Garey, David S. Johnson:
Computers and Intractability: A Guide to the Theory of NP-Completeness.
W. H. Freeman 1979, ISBN 0-7167-1044-7
- [22]
- ...
- [23]
- Eui-Hong Han, George Karypis, Vipin Kumar, Bamshad Mobasher:
Clustering Based On Association Rule Hypergraphs.
DMKD 1997: 0-
- [24]
- ...
- [25]
- Zhexue Huang:
A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining.
DMKD 1997: 0-
- [26]
- ...
- [27]
- ...
- [28]
- ...
- [29]
- ...
- [30]
- ...
- [31]
- Heikki Mannila, Hannu Toivonen, A. Inkeri Verkamo:
Discovering Frequent Episodes in Sequences.
KDD 1995: 210-215
- [32]
- ...
- [33]
- ...
- [34]
- ...
- [35]
- ...
- [36]
- ...
- [37]
- Hannu Toivonen:
Sampling Large Databases for Association Rules.
VLDB 1996: 134-145
- [38]
- ...
- [39]
- Tian Zhang, Raghu Ramakrishnan, Miron Livny:
BIRCH: An Efficient Data Clustering Method for Very Large Databases.
SIGMOD Conference 1996: 103-114
Copyright © Tue Mar 16 02:22:07 2010
by Michael Ley (ley@uni-trier.de)