ACM SIGMOD Anthology VLDB dblp.uni-trier.de

SPRINT: A Scalable Parallel Classifier for Data Mining.

John C. Shafer, Rakesh Agrawal, Manish Mehta: SPRINT: A Scalable Parallel Classifier for Data Mining. VLDB 1996: 544-555
@inproceedings{DBLP:conf/vldb/ShaferAM96,
  author    = {John C. Shafer and
               Rakesh Agrawal and
               Manish Mehta 0002},
  editor    = {T. M. Vijayaraman and
               Alejandro P. Buchmann and
               C. Mohan and
               Nandlal L. Sarda},
  title     = {SPRINT: A Scalable Parallel Classifier for Data Mining},
  booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
               Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
  publisher = {Morgan Kaufmann},
  year      = {1996},
  isbn      = {1-55860-382-4},
  pages     = {544-555},
  ee        = {db/conf/vldb/ShaferAM96.html},
  crossref  = {DBLP:conf/vldb/96},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Classification is an important data mining problem. Although classification is a well-studied problem, most of the current classification algorithms are designed only for memory-resident data, thus limiting their suitability for mining over large databases. The recently proposed SLIQ classification algorithm addressed several issues in building a fast scalable classifier. Unfortunately, SLIQ still requires some information to stay memory-resident. Furthermore, this information grows in direct proportion to the number of input records, putting a hard-limit on the size of data that can be classified.

We present for the first time a decision-tree-based classification algorithm that removes all of the memory restrictions, and is fast and scalable. The algorithm has also been designed to be easily parallelized. This parallelization, also presented here, represents the first scalable parallelization of a decision-tree classifier where all processors work together to build a single consistent model. The combination of these characteristics makes the proposed algorithm an ideal tool for data mining.

Copyright © 1996 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.): VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India. Morgan Kaufmann 1996, ISBN 1-55860-382-4
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Electronic Edition

References

[1]
Rakesh Agrawal, Sakti P. Ghosh, Tomasz Imielinski, Balakrishna R. Iyer, Arun N. Swami: An Interval Classifier for Database Mining Applications. VLDB 1992: 560-573 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[2]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Database Mining: A Performance Perspective. IEEE Trans. Knowl. Data Eng. 5(6): 914-925(1993) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[3]
...
[4]
...
[5]
Philip K. Chan, Salvatore J. Stolfo: Experiments on Multi-Strategy Learning by Meta-Learning. CIKM 1993: 314-323 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[6]
...
[7]
David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, Rick Rasmussen: The Gamma Database Machine Project. IEEE Trans. Knowl. Data Eng. 2(1): 44-62(1990) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[8]
David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider: Parallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting. PDIS 1991: 280-291 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[9]
...
[10]
...
[11]
David E. Goldberg: Genetic Algorithms in Search Optimization and Machine Learning. Addison-Wesley 1989, ISBN 0-201-15767-5
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[12]
...
[13]
Mike James: Classification Algorithms. John Wiley 1985, ISBN 0-471-84799-2
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[14]
...
[15]
Manish Mehta, Rakesh Agrawal, Jorma Rissanen: SLIQ: A Fast Scalable Classifier for Data Mining. EDBT 1996: 18-32 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[16]
Donald Michie, David J. Spiegelhalter, C. C. Taylor: Machine Learning, Neural and Statistical Classification. Ellis Horwood 1994, ISBN 0-13-106360-X
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[17]
...
[18]
...
[19]
J. Ross Quinlan: Induction of Decision Trees. Machine Learning 1(1): 81-106(1986) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[20]
J. Ross Quinlan: C4.5: Programs for Machine Learning. Morgan Kaufmann 1993, ISBN 1-55860-238-0
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[21]
...
[22]
...
[23]
...
[24]
Sholom M. Weiss, Casimir A. Kulikowski: Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems. Morgan Kaufmann 1990, ISBN 1-55860-065-5
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[25]
...

Copyright © Tue Mar 16 02:22:06 2010 by Michael Ley (ley@uni-trier.de)