Sampling Large Databases for Association Rules.

Hannu Toivonen: Sampling Large Databases for Association Rules. VLDB 1996: 134-145

@inproceedings{DBLP:conf/vldb/Toivonen96,
  author    = {Hannu Toivonen},
  editor    = {T. M. Vijayaraman and
               Alejandro P. Buchmann and
               C. Mohan and
               Nandlal L. Sarda},
  title     = {Sampling Large Databases for Association Rules},
  booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
               Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
  publisher = {Morgan Kaufmann},
  year      = {1996},
  isbn      = {1-55860-382-4},
  pages     = {134-145},
  ee        = {db/conf/vldb/Toivonen96.html},
  crossref  = {DBLP:conf/vldb/96},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Discovery of association rules is an important database mining problem. Current algorithms for finding association rules require several passes over the analyzed database, and obviously the role of I/O overhead is very significant for very large databases. We present new algorithms that reduce the database activity considerably. The idea is to pick a random sample, to find using this sample all association rules that probably hold in the whole database, and then to verify the results with the rest of the database. The algorithms thus produce exact association rules, not approximations based on a sample. The approach is, however, probabilistic, and in those rare cases where our sampling method does not produce all association rules, the missing rules can be found in a second pass. Our experiments show that the proposed algorithms can find association rules very efficiently in only one database pass.

Copyright © 1996 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

Download PDF file (www.vldb.org, Darmstadt, Germany)
Download PDF file (www.acm.org, New York, USA)

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ...

Windows: Click the letter of your CD drive
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Mac: Click here
UNIX/LINUX: mount the CD and click on the path of your mount point:
/Anthology/vldb8997 or /cdrom

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Windows: Click the letter of your CD drive
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Mac: Click here
UNIX/LINUX: mount the DVD and click on the path of your mount point:
/Anthology/aDVD1 or /dvd

Printed Edition

T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.): VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India. Morgan Kaufmann 1996, ISBN 1-55860-382-4
Contents

Electronic Edition

References

[AIS93]: Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216
[AMS+96]: Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, A. Inkeri Verkamo: Fast Discovery of Association Rules. Advances in Knowledge Discovery and Data Mining 1996: 307-328
[AS92]: Noga Alon, Joel Spencer: The Probabilistic Method. John Wiley 1992, ISBN 0-471-53588-5
Contents
[AS94]: Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499
[FPSSU96]: Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, Ramasamy Uthurusamy (Eds.): Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press 1996, ISBN 0-262-56097-6
Contents
[HF95]: Jiawei Han, Yongjian Fu: Discovery of Multiple-Level Association Rules from Large Databases. VLDB 1995: 420-431
[HKMT95]: Marcel Holsheimer, Martin L. Kersten, Heikki Mannila, Hannu Toivonen: A Perspective on Databases and Data Mining. KDD 1995: 150-155
[HS92]: Peter J. Haas, Arun N. Swami: Sequential Sampling Procedures for Query Size Estimation. SIGMOD Conference 1992: 341-350
[HS93]: ...
[KM94]: Jyrki Kivinen, Heikki Mannila: The Power of Sampling in Knowledge Discovery. PODS 1994: 77-85
[KMR+94]: Mika Klemettinen, Heikki Mannila, Pirjo Ronkainen, Hannu Toivonen, A. Inkeri Verkamo: Finding Interesting Rules from Large Sets of Discovered Association Rules. CIKM 1994: 401-407
[LSL95]: Hongjun Lu, Rudy Setiono, Huan Liu: NeuroRule: A Connectionist Approach to Data Mining. VLDB 1995: 478-489
[MT96]: ...
[MTV94]: Heikki Mannila, Hannu Toivonen, A. Inkeri Verkamo: Efficient Algorithms for Discovering Association Rules. KDD Workshop 1994: 181-192
[OR89]: Frank Olken, Doron Rotem: Random Sampling from B+ Trees. VLDB 1989: 269-277
[PCY95]: Jong Soo Park, Ming-Syan Chen, Philip S. Yu: An Effective Hash Based Algorithm for Mining Association Rules. SIGMOD Conference 1995: 175-186
[PSF91]: Gregory Piatetsky-Shapiro, William J. Frawley (Eds.): Knowledge Discovery in Databases. AAAI/MIT Press 1991, ISBN 0-262-62080-4
Contents
[SA95]: Ramakrishnan Srikant, Rakesh Agrawal: Mining Generalized Association Rules. VLDB 1995: 407-419
[SON95]: Ashok Savasere, Edward Omiecinski, Shamkant B. Navathe: An Efficient Algorithm for Mining Association Rules in Large Databases. VLDB 1995: 432-444
[TKR+95]: ...