@inproceedings{DBLP:conf/sigmod/AgrawalCFGHIIMMSS94, author = {Rakesh Agrawal and Michael J. Carey and Christos Faloutsos and Sakti P. Ghosh and Maurice A. W. Houtsma and Tomasz Imielinski and Balakrishna R. Iyer and A. Mahboob and H. Miranda and Ramakrishnan Srikant and Arun N. Swami}, editor = {Richard T. Snodgrass and Marianne Winslett}, title = {Quest: A Project on Database Mining}, booktitle = {Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 24-27, 1994}, publisher = {ACM Press}, year = {1994}, pages = {514}, ee = {http://doi.acm.org/10.1145/191839.191972, db/conf/sigmod/sigmod94-514.html}, crossref = {DBLP:conf/sigmod/94}, bibsource = {DBLP, http://dblp.uni-trier.de} }
Several organizations have collected massive amounts of data. These data sets are usually stored on tertiary storage and are very slowly migrating to database systems. One of the reasons for the limited success of database systems in this area is that current database systems do not provide the necessary functionality for a user interested in taking advantage of this information.
Database mining refers to the efficient construction and verification of models of patterns embedded in large databases, and is emerging as a major application area for databases. The goal of the Quest project i9 to enhance database technology to address this problem.
In the demonstration at Sigmod94, we will show some of the database mining technology that we have developed. In particular, we will demonstrate mining of association rules over sales data captured by retail organizations, such as department stores, supermarkets and catalog companies. An example of such a rule is that 98% of customers that purchase tires and auto accessories also get automotive services done.
The interesting aspect of our software is that it is not verification driven, which is the current state of art in industry. We ask the user to simply provide two input parameters driven by business considerations: i) minimum confidence, and ii) minimum support. 985 was the confidence in the above example, and it indicates the fraction of cases in which when the antecedent holds, the consequent of the rule also holds. The support of a rule is the fraction of total transactions in which the rule holds. By specifying minimum confidence and support, the user is asking for all the rules that have confidence above minimum confidence and that are present at least in minimum support fraction of transactions. We do not then require any more human intervention, and we generate all the rules that satisfy these constraints.
We will also demonstrate mining of sequential patterns in sales transactions. That is, we will show how to find what items customers buy over a series of visits in sequence (e.g. an order for sheets and pillowcases, followed by a comforter, followed by ruffles and shams). Again, we only require the user to specify minimum support, i.e. the minimum fraction of customer transaction sequences in which the pattern is required to be present. Our software then finds all sequential patterns that have minimum support. Note that a term of the sequence can have more than one item (e.g. sheets and pillowcases) and an item (or set of items) can appear multiple times.
Copyright © 1994 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.
CDROM Version: Load the CDROM "Volume 1 Issue 1, SIGMOD '93-'97" and ...