@inproceedings{DBLP:conf/vldb/KnorrN99, author = {Edwin M. Knorr and Raymond T. Ng}, editor = {Malcolm P. Atkinson and Maria E. Orlowska and Patrick Valduriez and Stanley B. Zdonik and Michael L. Brodie}, title = {Finding Intensional Knowledge of Distance-Based Outliers}, booktitle = {VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK}, publisher = {Morgan Kaufmann}, year = {1999}, isbn = {1-55860-615-7}, pages = {211-222}, ee = {db/conf/vldb/KnorrN99.html}, crossref = {DBLP:conf/vldb/99}, bibsource = {DBLP, http://dblp.uni-trier.de} }
Existing studies on outliers focus only on the identification aspect; none provides any intensional knowledge of the outliers - by which we mean a description or an explanation of why an identified outlier is exceptional. For many applications, a description or explanation is at least as vital to the user as the identification aspect. Specifically, intensional knowledge helps the user to: (i) evaluate the validity of the identified outliers, and (ii) improve one's understanding of the data.
The two main issues addresses in this paper are: what kinds of intensional knowledge to provide, and how to optimize the computation of such knowledge. With respect to the first issue, we propose finding strongest and weak outliers and their corresponding structural intensional knowledge. With respect to the second issue, we first present a naive and a semi-naive algorithm. Then, by means of what we call path and semi-lattice sharing of I/O processing, we develop two optimized approaches. We provide analytic results on their I/O performance, and present experimental results showing significant reductions in I/O and significant speedups in overall runtime.
Copyright © 1999 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.