Incremental Clustering for Mining in a Data Warehousing Environment.

Martin Ester, Hans-Peter Kriegel, Jörg Sander, Michael Wimmer, Xiaowei Xu: Incremental Clustering for Mining in a Data Warehousing Environment. VLDB 1998: 323-333

@inproceedings{DBLP:conf/vldb/EsterKSWX98,
  author    = {Martin Ester and
               Hans-Peter Kriegel and
               J{\"o}rg Sander and
               Michael Wimmer and
               Xiaowei Xu},
  editor    = {Ashish Gupta and
               Oded Shmueli and
               Jennifer Widom},
  title     = {Incremental Clustering for Mining in a Data Warehousing Environment},
  booktitle = {VLDB'98, Proceedings of 24rd International Conference on Very
               Large Data Bases, August 24-27, 1998, New York City, New York,
               USA},
  publisher = {Morgan Kaufmann},
  year      = {1998},
  isbn      = {1-55860-566-5},
  pages     = {323-333},
  ee        = {db/conf/vldb/EsterKSWX98.html},
  crossref  = {DBLP:conf/vldb/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Data warehouses provide a great deal of opportunities for performing data mining tasks such as classification and clustering. Typically, updates are collected and applied to the data warehouse periodically in a batch mode, e.g., during the night. Then, all patterns derived from the warehouse by some data mining algorithm have to be updated as well. Due to the very large size of the databases, it is highly desirable to perform these updates incrementally. In this paper, we present the first incremental clustering algorithm. Our algorithm is based on the clustering algorithm DBSCAN which is applicable to any database containing data from a metric space, e.g., to a spatial database or to a WWW-log database. Due to the density-based nature of DBSCAN, the insertion or deletion of anobject affects the current clustering only in the neighborhood of this object. Thus, efficient algorithms can be given for incremental insertions and deletions to an existing clustering. Based on the formal definition of clusters, it can be proven that the incremental algorithm yields the same result as DBSCAN. A performance evaluation of Incremental DBSCAN on a spatial database as well as on a WWW-log database is presented, demonstrating the efficiency of the proposed algorithm. Incremental DBSCAN yields significant speed-up factors over DBSCAN even for large numbers of daily updates in a data warehouse.

Copyright © 1998 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

Download PDF file (www.vldb.org, Darmstadt, Germany)
Download PDF file (www.acm.org, New York, USA)

ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ...

Windows: Click the letter of your CD drive
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Mac: Click here
UNIX/LINUX: mount the CD and click on the path of your mount point:
/Anthology/Disc99_1 or /cdrom

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Windows: Click the letter of your CD drive
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Mac: Click here
UNIX/LINUX: mount the DVD and click on the path of your mount point:
/Anthology/aDVD1 or /dvd

Printed Edition

Ashish Gupta, Oded Shmueli, Jennifer Widom (Eds.): VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA. Morgan Kaufmann 1998, ISBN 1-55860-566-5
Contents

References

[AF 96]: ...
[AS 94]: Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499
[BKSS 90]: Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990: 322-331
[Bou 96]: Athman Bouguettaya: On-Line Clustering. IEEE Trans. Knowl. Data Eng. 8(2): 333-339(1996)
[CHNW 96]: David Wai-Lok Cheung, Jiawei Han, Vincent T. Y. Ng, C. Y. Wong: Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique. ICDE 1996: 106-114
[CPZ 97]: Paolo Ciaccia, Marco Patella, Pavel Zezula: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. VLDB 1997: 426-435
[EKSX 96]: Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD 1996: 226-231
[EKX 95]: Martin Ester, Hans-Peter Kriegel, Xiaowei Xu: Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification. SSD 1995: 67-82
[EW 98]: Martin Ester, Rüdiger Wittmann: Incremental Generalization for Mining in a Data Warehousing Environment. EDBT 1998: 135-149
[FAAM 97]: Ronen Feldman, Yonatan Aumann, Amihood Amir, Heikki Mannila: Efficient Algorithms for Discovering Frequent Sets in Incremental Databases. DMKD 1997: 0-
[FPS 96]: Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth: Knowledge Discovery and Data Mining: Towards a Unifying Framework. KDD 1996: 82-88
[Gue 94]: Ralf Hartmut Güting: An Introduction to Spatial Database Systems. VLDB J. 3(4): 357-399(1994)
[HCC93]: Jiawei Han, Yandong Cai, Nick Cercone: Data-Driven Discovery of Quantitative Rules in Relational Databases. IEEE Trans. Knowl. Data Eng. 5(1): 29-40(1993)
[Huy 97]: Nam Huyn: Multiple-View Self-Maintenance in Data Warehousing Environments. VLDB 1997: 26-35
[KR 90]: L. Kaufman, P. J. Rousseeuw: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley 1990
[Luo 95]: ...
[MJHS 96]: ...
[MQM 97]: Inderpal Singh Mumick, Dallan Quass, Barinderpal Singh Mumick: Maintenance of Data Cubes and Summary Tables in a Warehouse. SIGMOD Conference 1997: 100-111
[NH 94]: Raymond T. Ng, Jiawei Han: Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB 1994: 144-155
[SEKX 98]: ...
[Sib 73]: R. Sibson: SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method. Comput. J. 16(1): 30-34(1973)
[ZRL 96]: Tian Zhang, Raghu Ramakrishnan, Miron Livny: BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD Conference 1996: 103-114