Mining Surprising Patterns Using Temporal Description Length.

Soumen Chakrabarti, Sunita Sarawagi, Byron Dom: Mining Surprising Patterns Using Temporal Description Length. VLDB 1998: 606-617

@inproceedings{DBLP:conf/vldb/ChakrabartiSD98,
  author    = {Soumen Chakrabarti and
               Sunita Sarawagi and
               Byron Dom},
  editor    = {Ashish Gupta and
               Oded Shmueli and
               Jennifer Widom},
  title     = {Mining Surprising Patterns Using Temporal Description Length},
  booktitle = {VLDB'98, Proceedings of 24rd International Conference on Very
               Large Data Bases, August 24-27, 1998, New York City, New York,
               USA},
  publisher = {Morgan Kaufmann},
  year      = {1998},
  isbn      = {1-55860-566-5},
  pages     = {606-617},
  ee        = {db/conf/vldb/ChakrabartiSD98.html},
  crossref  = {DBLP:conf/vldb/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

We propose a new notion of surprising temporal patterns in market basket data, and algorithms to find such patterns. This is distinct from finding frequent patterns as addressed in the common mining literature. We argue that once the analyst is already familiar with prevalent patterns in the data, the greatest incremental benefit is likely to be from changes in the relationship between item frequencies over time.

A simple measure of surprise is the extent of departure from a model, estimated using standard multivariate time series analysis. Unfortunately, such estimation involves models, smoothing windows and parameters whose optimal choices can vary dramatically from one application to another. In contrast, we propose a precise characterization of surprise based on the number of bits in which a basket sequence can be encoded under a carefully chosen coding scheme. In this scheme it is inexpensive to encode sequences of itemsets that have steady, hence likely to be well-known, correlation between items. Conversely, a sequence with large code length hints at a possibly surprising correlation.

Our notion of surprise also has the desirable property that the score of aset of items is offset by anything surprising that the user may already know from the marginal distribution of any proper subset. No parameters, such as support, confidence, or smoothing windows, need to be estimated or specified by the user.

We experimented with real-life market basket data. The algorithm successfully rejected a large number of frequent sets of items that bore obvious and steady complementary relations to each other, such as cereal and milk. Instead, our algorithm found itemsets that showed statistically strong fluctuations in correlation over time. These items had no obviously complementary roles.

Copyright © 1998 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

Download PDF file (www.vldb.org, Darmstadt, Germany)
Download PDF file (www.acm.org, New York, USA)

ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ...

Windows: Click the letter of your CD drive
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Mac: Click here
UNIX/LINUX: mount the CD and click on the path of your mount point:
/Anthology/Disc99_1 or /cdrom

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Windows: Click the letter of your CD drive
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Mac: Click here
UNIX/LINUX: mount the DVD and click on the path of your mount point:
/Anthology/aDVD1 or /dvd

Printed Edition

Ashish Gupta, Oded Shmueli, Jennifer Widom (Eds.): VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA. Morgan Kaufmann 1998, ISBN 1-55860-566-5
Contents

References

[1]: Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Database Mining: A Performance Perspective. IEEE Trans. Knowl. Data Eng. 5(6): 914-925(1993)
[2]: ...
[3]: Rakesh Agrawal, Giuseppe Psaila: Active Data Mining. KDD 1995: 3-8
[4]: Rakesh Agrawal, Giuseppe Psaila, Edward L. Wimmers, Mohamed Zaït: Querying Shapes of Histories. VLDB 1995: 502-514
[5]: ...
[6]: ...
[7]: ...
[8]: ...
[9]: ...
[10]: David Wai-Lok Cheung, Jiawei Han, Vincent T. Y. Ng, C. Y. Wong: Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique. ICDE 1996: 106-114
[11]: ...
[12]: ...
[13]: ...
[14]: ...
[15]: Mika Klemettinen, Heikki Mannila, Pirjo Ronkainen, Hannu Toivonen, A. Inkeri Verkamo: Finding Interesting Rules from Large Sets of Discovered Association Rules. CIKM 1994: 401-407
[16]: ...
[17]: Brian Lent, Rakesh Agrawal, Ramakrishnan Srikant: Discovering Trends in Text Databases. KDD 1997: 227-230
[18]: Banu Özden, Sridhar Ramaswamy, Abraham Silberschatz: Cyclic Association Rules. ICDE 1998: 412-421
[19]: ...
[20]: ...
[21]: Abraham Silberschatz, Alexander Tuzhilin: What Makes Patterns Interesting in Knowledge Discovery Systems. IEEE Trans. Knowl. Data Eng. 8(6): 970-974(1996)
[22]: Sergey Brin, Rajeev Motwani, Craig Silverstein: Beyond Market Baskets: Generalizing Association Rules to Correlations. SIGMOD Conference 1997: 265-276
[23]: ...
[24]: ...