Quest: A Project on Database Mining.

Rakesh Agrawal, Michael J. Carey, Christos Faloutsos, Sakti P. Ghosh, Maurice A. W. Houtsma, Tomasz Imielinski, Balakrishna R. Iyer, A. Mahboob, H. Miranda, Ramakrishnan Srikant, Arun N. Swami: Quest: A Project on Database Mining. SIGMOD Conference 1994: 514

@inproceedings{DBLP:conf/sigmod/AgrawalCFGHIIMMSS94,
  author    = {Rakesh Agrawal and
               Michael J. Carey and
               Christos Faloutsos and
               Sakti P. Ghosh and
               Maurice A. W. Houtsma and
               Tomasz Imielinski and
               Balakrishna R. Iyer and
               A. Mahboob and
               H. Miranda and
               Ramakrishnan Srikant and
               Arun N. Swami},
  editor    = {Richard T. Snodgrass and
               Marianne Winslett},
  title     = {Quest: A Project on Database Mining},
  booktitle = {Proceedings of the 1994 ACM SIGMOD International Conference on
               Management of Data, Minneapolis, Minnesota, May 24-27, 1994},
  publisher = {ACM Press},
  year      = {1994},
  pages     = {514},
  ee        = {http://doi.acm.org/10.1145/191839.191972, db/conf/sigmod/sigmod94-514.html},
  crossref  = {DBLP:conf/sigmod/94},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

1. Background

Several organizations have collected massive amounts of data. These data sets are usually stored on tertiary storage and are very slowly migrating to database systems. One of the reasons for the limited success of database systems in this area is that current database systems do not provide the necessary functionality for a user interested in taking advantage of this information.

Database mining refers to the efficient construction and verification of models of patterns embedded in large databases, and is emerging as a major application area for databases. The goal of the Quest project i9 to enhance database technology to address this problem.

2. Prototype

In the demonstration at Sigmod94, we will show some of the database mining technology that we have developed. In particular, we will demonstrate mining of association rules over sales data captured by retail organizations, such as department stores, supermarkets and catalog companies. An example of such a rule is that 98% of customers that purchase tires and auto accessories also get automotive services done.

The interesting aspect of our software is that it is not verification driven, which is the current state of art in industry. We ask the user to simply provide two input parameters driven by business considerations: i) minimum confidence, and ii) minimum support. 985 was the confidence in the above example, and it indicates the fraction of cases in which when the antecedent holds, the consequent of the rule also holds. The support of a rule is the fraction of total transactions in which the rule holds. By specifying minimum confidence and support, the user is asking for all the rules that have confidence above minimum confidence and that are present at least in minimum support fraction of transactions. We do not then require any more human intervention, and we generate all the rules that satisfy these constraints.

We will also demonstrate mining of sequential patterns in sales transactions. That is, we will show how to find what items customers buy over a series of visits in sequence (e.g. an order for sheets and pillowcases, followed by a comforter, followed by ruffles and shams). Again, we only require the user to specify minimum support, i.e. the minimum fraction of customer transaction sequences in which the pattern is required to be present. Our software then finds all sequential patterns that have minimum support. Note that a term of the sequence can have more than one item (e.g. sheets and pillowcases) and an item (or set of items) can appear multiple times.

Copyright © 1994 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.

ACM SIGMOD Anthology

Online Version (ACM WWW Account required): Full Text in PDF Format

CDROM Version: Load the CDROM "Volume 1 Issue 1, SIGMOD '93-'97" and ...

Windows: Click the letter of your CD drive
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Mac: Click here
UNIX/LINUX: mount the CD and click on the path of your mount point:
/Anthology/smod9397 or /cdrom

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Windows: Click the letter of your CD drive
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Mac: Click here
UNIX/LINUX: mount the DVD and click on the path of your mount point:
/Anthology/aDVD1 or /dvd

Printed Edition

Richard T. Snodgrass, Marianne Winslett (Eds.): Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 24-27, 1994. ACM Press 1994

, SIGMOD Record 23(2), June 1994
Contents

Online Edition: ACM Digital Library

[Index Terms]
[Full Text in PDF Format, 88 KB]

References

[1]: Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Database Mining: A Performance Perspective. IEEE Trans. Knowl. Data Eng. 5(6): 914-925(1993)
[2]: Rakesh Agrawal, Sakti P. Ghosh, Tomasz Imielinski, Balakrishna R. Iyer, Arun N. Swami: An Interval Classifier for Database Mining Applications. VLDB 1992: 560-573
[3]: Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216
[4]: Rakesh Agrawal, Christos Faloutsos, Arun N. Swami: Efficient Similarity Search In Sequence Databases. FODO 1993: 69-84