9. KDD 2003: Washington, DC, USA

Lise Getoor, Ted E. Senator, Pedro Domingos, Christos Faloutsos (Eds.): Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24 - 27, 2003. ACM 2003, ISBN 1-58113-737-0

@proceedings{DBLP:conf/kdd/2003,
  editor    = {Lise Getoor and
               Ted E. Senator and
               Pedro Domingos and
               Christos Faloutsos},
  title     = {Proceedings of the Ninth ACM SIGKDD International Conference
               on Knowledge Discovery and Data Mining, Washington, DC, USA,
               August 24 - 27, 2003},
  booktitle = {KDD},
  publisher = {ACM},
  year      = {2003},
  isbn      = {1-58113-737-0},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Invited talks

Jim Gray:
On-line science: the world-wide telescope as a prototype for the new computational science. 3
Daphne Koller:
Statistical learning from relational data. 4
Andreas S. Weigend:
Analyzing customer behavior at Amazon.com. 5

Research track

Charu C. Aggarwal:
Towards systematic design of distance functions for data mining applications. 9-18
Arindam Banerjee, Inderjit S. Dhillon, Joydeep Ghosh, Suvrit Sra:
Generative model-based clustering of directional data. 19-28
Stephen D. Bay, Mark Schwabacher:
Mining distance-based outliers in near linear time with randomization and a simple pruning rule. 29-38
Mikhail Bilenko, Raymond J. Mooney:
Adaptive duplicate detection using learnable string similarity measures. 39-48
Richard J. Bolton, Niall M. Adams:
An iterative hypothesis-testing strategy for pattern discovery. 49-58
Hervé Brönnimann, Bin Chen, Manoranjan Dash, Peter J. Haas, Peter Scheuermann:
Efficient data reduction with EASE. 59-68
Alain Casali, Rosine Cicchetti, Lotfi Lakhal:
Extracting semantics from data cubes using cube transversals and closures. 69-78
Darya Chudova, Scott Gaffney, Eric Mjolsness, Padhraic Smyth:
Translation-invariant mixture models for curve clustering. 79-88
Inderjit S. Dhillon, Subramanyam Mallela, Dharmendra S. Modha:
Information-theoretic co-clustering. 89-98
Magdalini Eirinaki, Michalis Vazirgiannis, Iraklis Varlamis:
SEWeP: using site semantics and a taxonomy to enhance the Web personalization process. 99-108
Mohammad El-Hajj, Osmar R. Zaïane:
Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining. 109-118
Oren Etzioni, Rattapoom Tuchinda, Craig A. Knoblock, Alexander Yates:
To buy or not to buy: mining airfare data to minimize ticket purchase price. 119-128
Aristides Gionis, Teija Kujala, Heikki Mannila:
Fragments of order. 129-136
David Kempe, Jon M. Kleinberg, Éva Tardos:
Maximizing the spread of influence through a social network. 137-146
Mehmet Koyutürk, Ananth Grama:
PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets. 147-156
Elias Pampalk, Werner Goebl, Gerhard Widmer:
Visualizing changes in the structure of data for exploratory feature selection. 157-166
Claudia Perlich, Foster J. Provost:
Aggregation-based feature invention and relational concept classes. 167-176
Sunita Sarawagi, Soumen Chakrabarti, Shantanu Godbole:
Cross-training: learning probabilistic mappings between topics. 177-186
Somayajulu Sripada, Ehud Reiter, Jim Hunter, Jin Yu:
Generating English summaries of time series data using the Gricean maxims. 187-196
Jeremy Tantrum, Alejandro Murua, Werner Stuetzle:
Assessment and pruning of hierarchical model based clustering. 197-205
Jaideep Vaidya, Chris Clifton:
Privacy-preserving k-means clustering over vertically partitioned data. 206-215
Michail Vlachos, Marios Hadjieleftheriou, Dimitrios Gunopulos, Eamonn J. Keogh:
Indexing multi-dimensional time-series with support for multiple distance measures. 216-225
Haixun Wang, Wei Fan, Philip S. Yu, Jiawei Han:
Mining concept-drifting data streams using ensemble classifiers. 226-235
Jianyong Wang, Jiawei Han, Jian Pei:
CLOSET+: searching for the best strategies for mining frequent closed itemsets. 236-245
Ke Wang, Yuelong Jiang, Laks V. S. Lakshmanan:
Mining unexpected rules by pushing user dynamics. 246-255
Geoffrey I. Webb, Shane M. Butler, Douglas A. Newlands:
On detecting differences between groups. 256-265
Scott White, Padhraic Smyth:
Algorithms for estimating relative importance in networks. 266-275
Xintao Wu, Daniel Barbará, Yong Ye:
Screening and interpreting multi-item associations based on log-linear modeling. 276-285
Xifeng Yan, Jiawei Han:
CloseGraph: mining closed frequent graph patterns. 286-295
Lan Yi, Bing Liu, Xiaoli Li:
Eliminating noisy information in Web pages for data mining. 296-305
Hwanjo Yu, Jiong Yang, Jiawei Han:
Classifying large data sets using SVMs with hierarchical clusters. 306-315
Mohammed Javeed Zaki, Charu C. Aggarwal:
XRules: an effective structural classifier for XML data. 316-325
Mohammed Javeed Zaki, Karam Gouda:
Fast vertical mining using diffsets. 326-335
Yunyue Zhu, Dennis Shasha:
Efficient elastic burst detection in data streams. 336-345

Industrial/government track

Kamal Ali, Steven P. Ketchpel:
Golden Path Analyzer: using divide-and-conquer to cluster Web clickstreams. 349-358
David M. Fram, June S. Almenoff, William DuMouchel:
Empirical Bayesian data mining for discovering patterns in post-marketing drug safety. 359-368
Tu Bao Ho, Trong Dung Nguyen, Saori Kawasaki, Si Quang Le, DucDung Nguyen, Hideto Yokoi, Katsuhiko Takabayashi:
Mining hepatitis data with temporal abstraction. 369-377
David Jensen, Matthew J. Rattigan, Hannah Blau:
Information awareness: a prospective technical assessment. 378-387
Mark Last, Menahem Friedman, Abraham Kandel:
The data mining approach to automated software testing. 388-396
Richard D. Lawrence, Se June Hong, Jacques Cherrier:
Passenger-based predictive modeling of airline no-show rates. 397-406
Gregory Piatetsky-Shapiro, Tom Khabaza, Sridhar Ramaswamy:
Capturing best practice for microarray gene expression data analysis. 407-415
R. Bharat Rao, Sathyakama Sandilya, Radu Stefan Niculescu, Colin Germond, Harsha Rao:
Clinical and financial outcomes analysis with existing hospital patient records. 416-425
Ramendra K. Sahoo, Adam J. Oliner, Irina Rish, Manish Gupta, José E. Moreira, Sheng Ma, Ricardo Vilalta, Anand Sivasubramaniam:
Critical event prediction for proactive management in large-scale computer clusters. 426-435
Rong She, Fei Chen, Ke Wang, Martin Ester, Jennifer L. Gardy, Fiona S. L. Brinkman:
Frequent-subsequence-based prediction of outer membrane proteins. 436-445
Michael Steinbach, Pang-Ning Tan, Vipin Kumar, Steven A. Klooster, Christopher Potter:
Discovery of climate indices using clustering. 446-455
Sholom M. Weiss, Stephen J. Buckley, Shubir Kapoor, Søren Damgaard:
Knowledge-based data mining. 456-461
Yi-Leh Wu, Kingshy Goh, Beitao Li, Huaxin You, Edward Y. Chang:
The anatomy of a multimodal information filter. 462-471

Research track

Shlomo Argamon, Marin Saric, Sterling Stuart Stein:
Style mining of electronic messages for multiple authorship discrimination: first results. 475-480
Raj Bhatnagar, Goutham Kurra, Wen Niu:
Mining high dimensional data for classifier knowledge. 481-486
Joong Hyuk Chang, Won Suk Lee:
Finding recent frequent itemsets adaptively over online data streams. 487-492
Bill Yuan-chi Chiu, Eamonn J. Keogh, Stefano Lonardi:
Probabilistic discovery of time series motifs. 493-498
William W. Cohen, Richard C. Wang, Robert F. Murphy:
Understanding captions in biomedical publications. 499-504
Wenliang Du, Zhijun Zhan:
Using randomized response techniques for privacy-preserving data mining. 505-510
William DuMouchel, Deepak K. Agarwal:
Applications of sampling and fractional factorial designs to model-free data squashing. 511-516
Dmitriy Fradkin, David Madigan:
Experiments with random projections for machine learning. 517-522
João Gama, Ricardo Rocha, Pedro Medas:
Accurate decision trees for mining high-speed data streams. 523-528
Sudipto Guha, Dimitrios Gunopulos, Nick Koudas:
Correlating synchronous and asynchronous data streams. 529-534
Sule Gündüz, M. Tamer Özsu:
A Web page prediction model based on click-stream tree representation of user behavior. 535-540
John E. Hopcroft, Omar Khan, Brian Kulis, Bart Selman:
Natural communities in large linked networks. 541-546
Michael E. Houle:
Navigating massive data sets via local clustering. 547-552
Wynne Hsu, Jing Dai, Mong-Li Lee:
Mining viewpoint patterns in image databases. 553-558
Chris Jermaine:
Playing hide-and-seek with correlations. 559-564
Daxin Jiang, Jian Pei, Aidong Zhang:
Interactive exploration of coherent patterns in time-series gene expression data. 565-570
Ruoming Jin, Gagan Agrawal:
Efficient decision tree construction on streaming data. 571-576
Sachindra Joshi, Neeraj Agrawal, Raghu Krishnapuram, Sumit Negi:
A bag of paths model for measuring structural similarity in Web documents. 577-582
Toshihiro Kamishima:
Nantonac collaborative filtering: recommendation based on order responses. 583-588
Yehuda Koren, David Harel:
A two-way visualization method for clustered data. 589-594
Kelvin T. Leung, Douglas Stott Parker Jr.:
Empirical comparisons of various voting methods in bagging. 595-600
Bing Liu, Robert L. Grossman, Yanhong Zhai:
Mining data records in Web pages. 601-606
Guimei Liu, Hongjun Lu, Wenwu Lou, Jeffrey Xu Yu:
On computing, storing and querying frequent patterns. 607-612
Junshui Ma, Simon Perkins:
Online novelty detection on temporal sequences. 613-618
Satoshi Morinaga, Kenji Yamanishi, Jun-ichi Takeuchi:
Distributed cooperative mining for information consortia. 619-624
Jennifer Neville, David Jensen, Lisa Friedland, Michael Hay:
Learning relational probability trees. 625-630
Caleb C. Noble, Diane J. Cook:
Graph-based anomaly detection. 631-636
Feng Pan, Gao Cong, Anthony K. H. Tung, Jiong Yang, Mohammed Javeed Zaki:
Carpenter: finding closed patterns in long biological datasets. 637-642
William Peter, John Chiochetti, Clare Giardina:
New unsupervised clustering algorithm for large datasets. 643-648
Karlton Sequeira, Mohammed Javeed Zaki, Boleslaw K. Szymanski, Christopher D. Carothers:
Improving spatial locality of programs via data mining. 649-654
Chun Tang, Aidong Zhang, Jian Pei:
Mining phenotypes and informative genes from gene expression data. 655-660
Feng Tao, Fionn Murtagh, Mohsen Farid:
Weighted Association Rule Mining using weighted support and significance framework. 661-666
Soon Tee Teoh, Kwan-Liu Ma:
PaintingClass: interactive construction, visualization and exploration of decision trees. 667-672
Ioannis Tsamardinos, Constantin F. Aliferis, Alexander R. Statnikov:
Time and sample efficient discovery of Markov blankets and direct causal relations. 673-678
Hang Yu, Ee-Chien Chang:
Distributed multivariate regression based on influential observations. 679-684
Lei Yu, Huan Liu:
Efficiently handling feature redundancy in high-dimensional data. 685-690

Industrial/government track

Rafael Alonso, Jeffrey A. Bloom, Hua Li, Chumki Basu:
An adaptive nearest neighbor search for a parts acquisition ePortal. 693-698
Philip S. Barry, Jianping Zhang, Mary McDonald:
Architecting a knowledge discovery engine for military commanders utilizing massive runs of simulations. 699-704
Tamraparni Dasu, Gregg T. Vesonder, Jon R. Wright:
Data quality through knowledge engineering. 705-710
Gloria T. Lau, Kincho H. Law, Gio Wiederhold:
Similarity analysis on government regulations. 711-716
Uwe F. Mayer, Armand Sarkissian:
Experimental design for solicitation campaigns. 717-722
Matthew Eric Otey, Srinivasan Parthasarathy, Amol Ghoting, G. Li, Sundeep Narravula, Dhabaleswar K. Panda:
Towards NIC-based intrusion detection. 723-728
Chang-Shing Perng, David Thoenen, Genady Grabarnik, Sheng Ma, Joseph L. Hellerstein:
Data-driven validation, completion and construction of event relationship networks. 729-734
Kevin B. Pratt, Gleb Tschapek:
Visualizing concept drift. 735-740
Keiko Shimazu, Atsuhito Momma, Koichi Furukawa:
Experimental study of discovering essential information from customer inquiry. 741-746
Zhongfei (Mark) Zhang, John J. Salerno, Philip S. Yu:
Applying data mining in investigating money laundering crimes. 747-752