13. KDD 2007: San Jose, California, USA

Pavel Berkhin, Rich Caruana, Xindong Wu (Eds.): Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12-15, 2007. ACM 2007, ISBN 978-1-59593-609-7

Chris Anderson:
Calculating latent demand in the long tail. 1
Usama M. Fayyad:
From mining the web to inventing the new sciences underlying the internet. 2-3
Jon M. Kleinberg:
Challenges in mining social network data: processes, privacy, and paradoxes. 4-5

Research track papers

Deepak Agarwal, Dhiman Barman, Dimitrios Gunopulos, Neal E. Young, Flip Korn, Divesh Srivastava:
Efficient and effective explanation of change in hierarchical summaries. 6-15
Deepak Agarwal, Andrei Z. Broder, Deepayan Chakrabarti, Dejan Diklic, Vanja Josifovski, Mayssam Sayyadian:
Estimating rates of rare events at multiple resolutions. 16-25
Deepak Agarwal, Srujana Merugu:
Predictive discrete latent factor models for large scale dyadic data. 26-35
Charu C. Aggarwal, Philip S. Yu:
On string classification in data streams. 36-45
Charu C. Aggarwal, Na Ta, Jianyong Wang, Jianhua Feng, Mohammed Javeed Zaki:
Xproj: a framework for projected structural clustering of xml documents. 46-55
Nikolay Archak, Anindya Ghose, Panagiotis G. Ipeirotis:
Show me the money!: deriving the pricing power of product features by mining consumer reviews. 56-65
Andrew Arnold, Yan Liu, Naoki Abe:
Temporal causal modeling with graphical granger methods. 66-75
Ricardo A. Baeza-Yates, Alessandro Tiberi:
Extracting semantic relations from query logs. 76-85
Hila Becker, Marta Arias:
Real-time ranking with concept drift using expert advice. 86-94
Robert M. Bell, Yehuda Koren, Chris Volinsky:
Modeling relationships at multiple scales to improve accuracy of large recommender systems. 95-104
Deepavali Bhagwat, Kave Eshghi, Pankaj Mehra:
Content-based document routing and index partitioning for scalable similarity-based searches in a large corpus. 105-112
Wanpracha Art Chaovalitwongse, Ya-Ju Fan, Rajesh C. Sachdeo:
Support feature machine for classification of abnormal brain activity. 113-122
Jianhui Chen, Zheng Zhao, Jieping Ye, Huan Liu:
Nonlinear adaptive distance metric learning for clustering. 123-132
Yixin Chen, Li Tu:
Density-based clustering for real-time stream data. 133-142
Peter A. Chew, Brett W. Bader, Tamara G. Kolda, Ahmed Abdelali:
Cross-language information retrieval using PARAFAC2. 143-152
Yun Chi, Xiaodan Song, Dengyong Zhou, Koji Hino, Belle L. Tseng:
Evolutionary spectral clustering by incorporating temporal smoothness. 153-162
Yun Chi, Shenghuo Zhu, Xiaodan Song, Jun'ichi Tatemura, Belle L. Tseng:
Structural and temporal analysis of the blogosphere through community factorization. 163-172
Sumit Chopra, Trivikraman Thampy, John Leahy, Andrew Caplin, Yann LeCun:
Discovering the hidden structure of house prices with a non-parametric latent manifold model. 173-182
Paul Cotofrei, Kilian Stoffel:
Stochastic processes and temporal data mining. 183-190
Daniel Crabtree, Peter Andreae, Xiaoying Gao:
Exploiting underrepresented query aspects for automatic query expansion. 191-200
Aron Culotta, Michael L. Wick, Robert Hall, Matthew Marzilli, Andrew McCallum:
Canonicalization of database records using adaptive similarity measures. 201-209
Wenyuan Dai, Gui-Rong Xue, Qiang Yang, Yong Yu:
Co-clustering based classification for out-of-domain documents. 210-219
Kaustav Das, Jeff G. Schneider:
Detecting anomalous records in categorical datasets. 220-229
Anirban Dasgupta, Petros Drineas, Boulos Harb, Vanja Josifovski, Michael W. Mahoney:
Feature selection methods for text classification. 230-239
Ian Davidson, S. S. Ravi, Martin Ester:
Efficient incremental constrained clustering. 240-249
Meghana Deodhar, Joydeep Ghosh:
A framework for simultaneous co-clustering and learning from complex data. 250-259
Chris H. Q. Ding, Rong Jin, Tao Li, Horst D. Simon:
A learning framework using Green's function and kernel regularization with application to recommender system. 260-269
Dejing Dou, Gwen A. Frishkoff, Jiawei Rong, Robert Frank, Allen D. Malony, Don M. Tucker:
Development of NeuroElectroMagnetic ontologies(NEMO): a framework for mining brainwave ontologies. 270-279
Gregory Druck, Chris Pal, Andrew McCallum, Xiaojin Zhu:
Semi-supervised classification with hybrid generative/discriminative methods. 280-289
Lisa Friedland, David Jensen:
Finding tribes: identifying close-knit individuals from employment patterns. 290-299
Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Huan Liu, Philip S. Yu:
Time-dependent event hierarchy construction. 300-309
Byron J. Gao, Martin Ester, Jin-yi Cai, Oliver Schulte, Hui Xiong:
The minimum consistent subset cover problem and its applications in data mining. 310-319
Rong Ge, Martin Ester, Wen Jin, Ian Davidson:
Constraint-driven clustering. 320-329
Fosca Giannotti, Mirco Nanni, Fabio Pinelli, Dino Pedreschi:
Trajectory pattern mining. 330-339
Zhen Guo, Zhongfei Zhang, Eric P. Xing, Christos Faloutsos:
Enhanced max margin learning on multimodal data mining in a multimedia database. 340-349
Hannes Heikinheimo, Jouni K. Seppänen, Eino Hinkkanen, Heikki Mannila, Taneli Mielikäinen:
Finding low-entropy sets and trees from binary data. 350-359
Frizo A. L. Janssens, Wolfgang Glänzel, Bart De Moor:
Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis. 360-369
Yookyung Jo, Carl Lagoze, C. Lee Giles:
Detecting research topics via the correlation between graphs and texts. 370-379
Panagiotis Karras, Dimitris Sacharidis, Nikos Mamoulis:
Exploiting duality in summarization with deterministic guarantees. 380-389
Yiping Ke, James Cheng, Wilfred Ng:
Correlation search in graph databases. 390-399
Aleksander Kolcz, Wen-tau Yih:
Raising the baseline for high-precision text classifiers. 400-409
Srivatsan Laxman, P. S. Sastry, K. P. Unnikrishnan:
A fast algorithm for finding frequent episodes in event streams. 410-419
Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie S. Glance:
Cost-effective outbreak detection in networks. 420-429
Jinyan Li, Guimei Liu, Limsoon Wong:
Mining statistically important equivalence classes and delta-discriminative emerging patterns. 430-439
Ping Li:
Very sparse stable random projections for dimension reduction in lalpha (0 <alpha<=2) norm. 440-449
Yi Liu, Rong Jin, Anil K. Jain:
BoostCluster: boosting clustering by pairwise constraints. 450-459
David Lo, Siau-Cheng Khoo, Chao Liu:
Efficient mining of iterative patterns for software specification discovery. 460-469
Bo Long, Zhongfei (Mark) Zhang, Philip S. Yu:
A probabilistic framework for relational clustering. 470-479
Heikki Mannila, Evimaria Terzi:
Nestedness and segmented nestedness. 480-489
Qiaozhu Mei, Xuehua Shen, ChengXiang Zhai:
Automatic labeling of multinomial topic models. 490-499
David M. Mimno, Andrew McCallum:
Expertise modeling for matching papers with reviewers. 500-509
Flavia Moser, Rong Ge, Martin Ester:
Joint cluster analysis of attribute and relationship data withouta-priori specification of the number of clusters. 510-519
Ramesh Nallapati, Susan Ditmore, John D. Lafferty, Kin Ung:
Multiscale topic tomography. 520-529
Siegfried Nijssen, Élisa Fromont:
Mining optimal decision trees from itemset lattices. 530-539
Gaurav Pandey, Michael Steinbach, Rohit Gupta, Tushar Garg, Vipin Kumar:
Association analysis-based transformations for protein interaction networks: a function prediction case study. 540-549
Seung-Taek Park, David M. Pennock:
Applying collaborative filtering techniques to movie search for better ranking and browsing. 550-559
Raymond K. Pon, Alfonso F. Cardenas, David Buttler, Terence Critchlow:
Tracking multiple topics for finding interesting articles. 560-569
Filip Radlinski, Thorsten Joachims:
Active exploration for learning rankings from clickthrough data. 570-579
Mark Sandler:
Hierarchical mixture models: a probabilistic analysis. 580-589
Issei Sato, Hiroshi Nakagawa:
Knowledge discovery of multiple-topic document using parametric mixture model with dirichlet prior. 590-598
Vincent Schickel-Zuber, Boi Faltings:
Using hierarchical clustering for learning theontologies used in recommendation systems. 599-608
D. Sculley:
Practical learning from one-sided feedback. 609-618
Benyah Shaparenko, Thorsten Joachims:
Information genealogy: uncovering the flow of ideas in non-hyperlinked document databases. 619-628
Shady Shehata, Fakhri Karray, Mohamed Kamel:
A concept-based model for enhancing text categorization. 629-637
Victor S. Sheng, Charles X. Ling:
Partial example acquisition in cost-sensitive learning. 638-646
Motoki Shiga, Ichigaku Takigawa, Hiroshi Mamitsuka:
A spectral clustering approach to optimally combining numericalvectors with a modular network. 647-656
Andrew T. Smith, Charles Elkan:
Making generative classifiers robust to selection bias. 657-666
Xiuyao Song, Mingxi Wu, Christopher M. Jermaine, Sanjay Ranka:
Statistical change detection for multi-dimensional data. 667-676
Rohini K. Srihari, Li Xu, Tushar Saxena:
Use of ranked cross document evidence trails for hypothesis generation. 677-686
Jimeng Sun, Christos Faloutsos, Spiros Papadimitriou, Philip S. Yu:
GraphScope: parameter-free mining of large time-evolving graphs. 687-696
Gaurav Tandon, Philip K. Chan:
Weighting versus pruning in rule validation for detecting network and host anomalies. 697-706
Wei Tang, Hui Xiong, Shi Zhong, Jie Wu:
Enhancing semi-supervised clustering: a feature projection perspective. 707-716
Chayant Tantipathananandh, Tanya Y. Berger-Wolf, David Kempe:
A framework for community identification in dynamic social networks. 717-726
Choon Hui Teo, Alex J. Smola, S. V. N. Vishwanathan, Quoc V. Le:
A scalable modular convex solver for regularized risk minimization. 727-736
Hanghang Tong, Christos Faloutsos, Brian Gallagher, Tina Eliassi-Rad:
Fast best-effort pattern matching in large attributed graphs. 737-746
Hanghang Tong, Christos Faloutsos, Yehuda Koren:
Fast direction-aware proximity for graph mining. 747-756
David S. Vogel, Ognian Asparouhov, Tobias Scheffer:
Scalable look-ahead linear regression trees. 757-764
Jilles Vreeken, Matthijs van Leeuwen, Arno Siebes:
Characterising the difference. 765-774
Li Wan, Wee Keong Ng, Shuguo Han, Vincent C. S. Lee:
Privacy-preservation for gradient descent methods. 775-783
Xuanhui Wang, ChengXiang Zhai, Xiao Hu, Richard Sproat:
Mining correlated bursty topic patterns from coordinated text streams. 784-793
Xuerui Wang, Chris Pal, Andrew McCallum:
Generalized component analysis for text with heterogeneous attributes. 794-803
Raymond Chi-Wing Wong, Jian Pei, Ada Wai-Chee Fu, Ke Wang:
Mining favorable facets. 804-813
Junjie Wu, Hui Xiong, Peng Wu, Jian Chen:
Local decomposition for rare class analysis. 814-823
Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, Thomas A. J. Schweiger:
SCAN: a structural clustering algorithm for networks. 824-833
Rong Yan, Jelena Tesic, John R. Smith:
Model-shared subspace boosting for multi-label classification. 834-843
Dragomir Yankov, Eamonn J. Keogh, Jose Medina, Bill Yuan-chi Chiu, Victor B. Zordan:
Detecting time series motifs under uniform scaling. 844-853
Jieping Ye, Shuiwang Ji, Jianhui Chen:
Learning the kernel matrix in discriminant analysis via quadratically constrained quadratic programming. 854-863
Junsong Yuan, Ying Wu, Ming Yang:
From frequent itemsets to semantically meaningful visual patterns. 864-873
Xian Zhang, Yu Hao, Xiaoyan Zhu, Ming Li, David R. Cheriton:
Information distance from a question to an answer. 874-883
Hongkun Zhao, Weiyi Meng, Clement T. Yu:
Mining templates from search result records of search engines. 884-893
Shuyi Zheng, Ruihua Song, Ji-Rong Wen, Di Wu:
Joint optimization of wrapper generation and template detection. 894-902
Jun Zhu, Bo Zhang, Zaiqing Nie, Ji-Rong Wen, Hsiao-Wuen Hon:
Webpage understanding: an integrated approach. 903-912

Industrial and government track papers

Sitaram Asur, Srinivasan Parthasarathy, Duygu Ucar:
An event-based framework for characterizing the evolutionary behavior of interaction graphs. 913-921
Rebecca Castaño, Kiri Wagstaff, Steve A. Chien, Timothy M. Stough, Benyang Tang:
On-board analysis of uncalibrated data for a spacecraft at mars. 922-930
Andrew Fast, Lisa Friedland, Marc Maier, Brian Taylor, David Jensen, Henry G. Goldberg, John Komoroske:
Relational data pre-processing techniques for improved securities fraud detection. 941-949
Ming Hua, Jian Pei:
Cleaning disguised missing data: a heuristic approach. 950-958
Ron Kohavi, Randal M. Henne, Dan Sommerfield:
Practical guide to controlled experiments on the web: listen to your customers not to the hippo. 959-967
Ping Luo, Hui Xiong, Kevin Lü, Zhongzhi Shi:
Distributed classification in peer-to-peer networks. 968-976
Claudia Perlich, Saharon Rosset, Richard D. Lawrence, Bianca Zadrozny:
High-quantile modeling for customer wallet estimation and other applications. 977-985
Jun Hua Zhao, Zhao Yang Dong, Pei Zhang:
Mining complex power networks for blackout prevention. 986-994
Shubin Zhao, Jonathan Betz:
Corroborate and learn facts from the web. 995-1003
Guangyu Zhu, Timothy J. Bethea, Vikas Krishna:
Extracting relevant named entities for automated expense reimbursement. 1004-1012

Industrial and government track short papers

Charu C. Aggarwal:
A framework for classification and segmentation of massive audio data streams. 1013-1017
Chris Curry, Robert L. Grossman, David Locke, Steve Vejcik, Joseph Bugajski:
Detecting changes in large data sets of payment card data: a case study. 1018-1022
Rong Pan, Junhui Zhao, Vincent Wenchen Zheng, Jeffrey Junfeng Pan, Dou Shen, Sinno Jialin Pan, Qiang Yang:
Domain-constrained semi-supervised mining of tracking models in sensor networks. 1023-1027
Wei Peng, Charles Perng, Tao Li, Haixun Wang:
Event summarization for system management. 1028-1032
R. Bharat Rao, Jinbo Bi, Glenn Fung, Marcos Salganicoff, Nancy Obuchowski, David P. Naidich:
LungCAD: a clinically approved, machine learning system for lung cancer detection. 1033-1037
Robert J. Yan, Charles X. Ling:
Machine learning for stock selection. 1038-1042
Yanfang Ye, Dingding Wang, Tao Li, Dongyi Ye:
IMDS: intelligent malware detection system. 1043-1047
Xiaoxin Yin, Jiawei Han, Philip S. Yu:
Truth discovery with multiple conflicting information providers on the web. 1048-1052

Panel

Srinivasan Parthasarathy:
Data mining at the crossroads: successes, failures and learning from them. 1053-1055