VLDB 1996: 134-145@inproceedings{DBLP:conf/vldb/Toivonen96,
author = {Hannu Toivonen},
editor = {T. M. Vijayaraman and
Alejandro P. Buchmann and
C. Mohan and
Nandlal L. Sarda},
title = {Sampling Large Databases for Association Rules},
booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
publisher = {Morgan Kaufmann},
year = {1996},
isbn = {1-55860-382-4},
pages = {134-145},
ee = {db/conf/vldb/Toivonen96.html},
crossref = {DBLP:conf/vldb/96},
bibsource = {DBLP,}
Discovery of association rules is an important database mining problem.
Current algorithms for finding association rules require several passes
over the analyzed database, and obviously the role of I/O overhead is
very significant for very large databases. We present new algorithms that
reduce the database activity considerably. The idea is to pick a random
sample, to find using this sample all association rules that probably
hold in the whole database, and then to verify the results with the rest
of the database. The algorithms thus produce exact association rules, not
approximations based on a sample. The approach is, however, probabilistic,
and in those rare cases where our sampling method does not produce all
association rules, the missing rules can be found in a second pass. Our
experiments show that the proposed algorithms can find association rules
very efficiently in only one database pass.
