15. KDD 2009: Paris, France

John F. Elder IV, Françoise Fogelman-Soulié, Peter A. Flach, Mohammed Javeed Zaki (Eds.): Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, June 28 - July 1, 2009. ACM 2009, ISBN 978-1-60558-495-9

Keynote talks

David J. Hand:
Mismatched models, wrong results, and dreadful decisions: on choosing appropriate data mining tools. 1-2
Ravi Kumar:
Mining web logs: applications and challenges. 3-4
Heikki Mannila:
Randomization methods in data mining. 5-6
Ashok N. Srivastava:
Data mining at NASA: from theory to applications. 7-8
Stanley Wasserman:
Network science: an introduction to recent statistical approaches. 9-10

Panel

Michael Zeller, Robert Grossman, Christoph Lingenfelder, Michael R. Berthold, Erik Marcade, Rick Pechter, Mike Hoskins, Wayne Thompson, Rich Holada:
Open standards and cloud computing: KDD-2009 panel report. 11-18

Research track papers

Deepak Agarwal, Bee-Chung Chen:
Regression-based latent factor models. 19-28
Charu C. Aggarwal, Yan Li, Jianyong Wang, Jing Wang:
Frequent pattern mining with uncertain data. 29-38
Amr Ahmed, Eric P. Xing, William W. Cohen, Robert F. Murphy:
Structured correspondence topic models for mining captioned figures in biological literature. 39-48
Anurag Ambekar, Charles B. Ward, Jahangir Mohammed, Swapna Male, Steven Skiena:
Name-ethnicity classification from open sources. 49-58
Shin Ando, Einoshin Suzuki:
Detection of unique temporal segments by information theoretic meta-clustering. 59-68
Mafruz Zaman Ashrafi, See-Kiong Ng:
Collusion-resistant anonymous data collection method. 69-78
Sitaram Asur, Srinivasan Parthasarathy:
A viewpoint-based approach for interaction graph analysis. 79-88
Lars Backstrom, Jon M. Kleinberg, Ravi Kumar:
Optimizing web traffic via the media scheduling problem. 89-98
Ron Bekkerman, Martin Scholz, Krishnamurthy Viswanathan:
Improving clustering stability with combinatorial MRFs. 99-108
Michele Berlingerio, Fabio Pinelli, Mirco Nanni, Fosca Giannotti:
Temporal mining for interactive workflow data analysis. 109-118
Thomas Bernecker, Hans-Peter Kriegel, Matthias Renz, Florian Verhein, Andreas Züfle:
Probabilistic frequent itemset mining in uncertain databases. 119-128
Alina Beygelzimer, John Langford:
The offset tree for learning with partial labels. 129-138
Albert Bifet, Geoffrey Holmes, Bernhard Pfahringer, Richard Kirkby, Ricard Gavaldà:
New ensemble methods for evolving data streams. 139-148
Christian Böhm, Katrin Haegler, Nikola S. Müller, Claudia Plant:
CoCo: coding cost for parameter-free outlier detection. 149-158
Yingyi Bu, Lei Chen, Ada Wai-Chee Fu, Dawei Liu:
Efficient anomaly monitoring over moving object trajectory streams. 159-168
Jonathan Chang, Jordan L. Boyd-Graber, David M. Blei:
Connections between the lines: augmenting social networks with text. 169-178
Bo Chen, Wai Lam, Ivor Tsang, Tak-Lam Wong:
Extracting discriminative concepts for domain adaptation in text mining. 179-188
Minmin Chen, Yixin Chen, Michael R. Brent, Aaron E. Tenney:
Constrained optimization for validation-guided conditional random field learning. 189-198
Wei Chen, Yajun Wang, Siyu Yang:
Efficient influence maximization in social networks. 199-208
Ye Chen, Dmitry Pavlov, John F. Canny:
Large-scale behavioral targeting. 209-218
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, Prabhakar Raghavan:
On compressing social networks. 219-228
Erick Delage:
Regret-based online ranking for a growing digital library. 229-238
Hongbo Deng, Michael R. Lyu, Irwin King:
A generalized Co-HITS algorithm and its application to bipartite graphs. 239-248
Meghana Deodhar, Joydeep Ghosh:
Mining for the most certain predictions from dyadic data. 249-258
Pinar Donmez, Jaime G. Carbonell, Jeff Schneider:
Efficiently learning the accuracy of labeling sources for selective sampling. 259-268
Nan Du, Christos Faloutsos, Bai Wang, Leman Akoglu:
Large human communication networks: patterns and a utility-driven generator. 269-278
Murat Dundar, E. Daniel Hirleman, Arun K. Bhunia, J. Paul Robinson, Bartek Rajwa:
Learning with a non-exhaustive training dataset: a case study: detection of bacteria cultures using optical-scattering technology. 279-288
Khalid El-Arini, Gaurav Veda, Dafna Shahaf, Carlos Guestrin:
Turning down the noise in the blogosphere. 289-298
George Forman, Martin Scholz, Shyamsundar Rajaram:
Feature shaping for linear SVM classifiers. 299-308
Richard Frank, Martin Ester, Arno Knobbe:
A multi-relational approach to spatial classification. 309-318
Antonino Freno, Edmondo Trentin, Marco Gori:
Scalable pseudo-likelihood estimation in hybrid random fields. 319-328
João Gama, Raquel Sebastião, Pedro Pereira Rodrigues:
Issues in evaluation of stream learning algorithms. 329-338
Jing Gao, Wei Fan, Yizhou Sun, Jiawei Han:
Heterogeneous source consensus learning via decision propagation and negotiation. 339-348
Yong Ge, Hui Xiong, Wenjun Zhou, Ramendra K. Sahoo, Xiaofeng Gao, Weili Wu:
Multi-focal learning and its application to customer service support. 349-358
Quanquan Gu, Jie Zhou:
Co-clustering on manifolds. 359-368
Lei Guo, Enhua Tan, Songqing Chen, Xiaodong Zhang, Yihong Eric Zhao:
Analyzing patterns of user content generation in online social networks. 369-378
Sami Hanhijärvi, Markus Ojala, Niko Vuokko, Kai Puolamäki, Nikolaj Tatti, Heikki Mannila:
Tell me something I don't know: randomization strategies for iterative data mining. 379-388
Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, Xiaohua Zhou:
Exploiting Wikipedia as external knowledge for document clustering. 389-396
Mohsen Jamali, Martin Ester:
TrustWalker: a random walk model for combining trust-based and item-based recommendation. 397-406
Shuiwang Ji, Lei Yuan, Ying-Xin Li, Zhi-Hua Zhou, Sudhir Kumar, Jieping Ye:
Drosophila gene expression pattern annotation using sparse features and term-term interactions. 407-416
Ruoming Jin, Yang Xiang, Lin Liu:
Cartesian contour: a concise representation for a collection of frequent sets. 417-426
Aleksander Kolcz, Gordon V. Cormack:
Genre-based decomposition of email class noise. 427-436
Arne Koopman, Arno Siebes:
Characteristic relational patterns. 437-446
Yehuda Koren:
Collaborative filtering with temporal dynamics. 447-456
Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, Soumen Chakrabarti:
Collective annotation of Wikipedia entities in web text. 457-466
Theodoros Lappas, Kun Liu, Evimaria Terzi:
Finding a team of experts in social networks. 467-476
Theodoros Lappas, Benjamin Arai, Manolis Platakis, Dimitrios Kotsakos, Dimitrios Gunopulos:
On burstiness-aware search for document sequences. 477-486
Mark Last:
Improving data mining utility with projective sampling. 487-496
Jure Leskovec, Lars Backstrom, Jon M. Kleinberg:
Meme-tracking and the dynamics of the news cycle. 497-506
Lei Li, James McCann, Nancy S. Pollard, Christos Faloutsos:
DynaMMo: mining and summarization of coevolving sequences with missing values. 507-516
Tiancheng Li, Ninghui Li:
On the tradeoff between privacy and utility in data publishing. 517-526
Yu-Ru Lin, Jimeng Sun, Paul Castro, Ravi B. Konuru, Hari Sundaram, Aisling Kelliher:
MetaFac: community discovery via relational hypergraph factorization. 527-536
Chao Liu, Fan Guo, Christos Faloutsos:
BBM: bayesian browsing model from petabyte-scale data. 537-546
Jun Liu, Jianhui Chen, Jieping Ye:
Large-scale sparse logistic regression. 547-556
David Lo, Hong Cheng, Jiawei Han, Siau-Cheng Khoo, Chengnian Sun:
Classification of software behaviors for failure detection: a discriminative pattern mining approach. 557-566
Steven Loscalzo, Lei Yu, Chris H. Q. Ding:
Consensus group stable feature selection. 567-576
Aurelie C. Lozano, Naoki Abe, Yan Liu, Saharon Rosset:
Grouped graphical Granger modeling methods for temporal causal modeling. 577-586
Aurelie C. Lozano, Hongfei Li, Alexandru Niculescu-Mizil, Yan Liu, Claudia Perlich, Jonathan R. M. Hosking, Naoki Abe:
Spatial-temporal causal modeling for climate change attribution. 587-596
Sofus A. Macskassy:
Using graph-based metrics with empirical risk minimization to speed up active learning on networked data. 597-606
R. Dean Malmgren, Jake M. Hofman, Luis A. N. Amaral, Duncan J. Watts:
Characterizing individual communication patterns. 607-616
Andreas Maunz, Christoph Helma, Stefan Kramer:
Large-scale graph mining using backbone refinement classes. 617-626
Frank McSherry, Ilya Mironov:
Differentially private recommender systems: building privacy into the net. 627-636
Anna Monreale, Fabio Pinelli, Roberto Trasarti, Fosca Giannotti:
WhereNext: a location predictor on trajectory pattern mining. 637-646
Siegfried Nijssen, Tias Guns, Luc De Raedt:
Correlated itemset mining in ROC space: a constraint programming approach. 647-656
Kensuke Onuma, Hanghang Tong, Christos Faloutsos:
TANGENT: a novel, 'Surprise me', recommendation algorithm. 657-666
Rong Pan, Martin Scholz:
Mind the gaps: weighting the unknown in large-scale one-class collaborative filtering. 667-676
Gaurav Pandey, Gowtham Atluri, Michael Steinbach, Chad L. Myers, Vipin Kumar:
An association analysis approach to biclustering. 677-686
Ardian Kristanto Poernomo, Vivekanand Gopalkrishnan:
CP-summary: a concise representation for browsing frequent itemsets. 687-696
Ardian Kristanto Poernomo, Vivekanand Gopalkrishnan:
Towards efficient mining of proportional fault-tolerant frequent itemsets. 697-706
Foster J. Provost, Brian Dalessandro, Rod Hook, Xiaohan Zhang, Alan Murray:
Audience selection for on-line brand advertising: privacy-friendly social network targeting. 707-716
Zijie Qi, Ian Davidson:
A principled and flexible framework for finding alternative clusterings. 717-726
Steffen Rendle, Leandro Balby Marinho, Alexandros Nanopoulos, Lars Schmidt-Thieme:
Learning optimal ranking with tensor factorization for tag recommendation. 727-736
Venu Satuluri, Srinivasan Parthasarathy:
Scalable graph clustering using stochastic flows: applications to community discovery. 737-746
Jerry Scripps, Pang-Ning Tan, Abdol-Hossein Esfahanian:
Measuring the effects of preprocessing decisions and network forces in dynamic network analysis. 747-756
Bao-Hong Shen, Shuiwang Ji, Jieping Ye:
Mining discrete patterns via binary matrix factorization. 757-766
Lei Shi, Vandana Pursnani Janeja:
Anomalous window discovery through scan statistics for linear intersecting paths (SSLIP). 767-776
Xiaolin Shi, Jun Zhu, Rui Cai, Lei Zhang:
User grouping behavior in online forums. 777-786
Takashi Shibuya, Tatsuya Harada, Yasuo Kuniyoshi:
Causality quantification and its applications: structuring and modeling of multivariate time series. 787-796
Yizhou Sun, Yintao Yu, Jiawei Han:
Ranking-based clustering of heterogeneous information networks with star network schema. 797-806
Jie Tang, Jimeng Sun, Chi Wang, Zi Yang:
Social influence analysis in large-scale networks. 807-816
Lei Tang, Huan Liu:
Relational learning via latent social dimensions. 817-826
Chayant Tantipathananandh, Tanya Y. Berger-Wolf:
Constant-factor approximation algorithms for identifying dynamic communities. 827-836
Charalampos E. Tsourakakis, U. Kang, Gary L. Miller, Christos Faloutsos:
DOULION: counting triangles in massive graphs with a coin. 837-846
Pavan Vatturi, Weng-Keen Wong:
Category detection using hierarchical mean shift. 847-856
Ting Wang, Mudhakar Srivatsa, Dakshi Agrawal, Ling Liu:
Learning, indexing, and diagnosing network faults. 857-866
Xuanhui Wang, Deepayan Chakrabarti, Kunal Punera:
Mining broad latent query aspects from search sessions. 867-876
Junjie Wu, Hui Xiong, Jian Chen:
Adapting the right measures for K-means clustering. 877-886
Mingxi Wu, Xiuyao Song, Chris Jermaine, Sanjay Ranka, John Gums:
A LRT framework for fast spatial anomaly detection. 887-896
Jack Chongjie Xue, Gary M. Weiss:
Quantification and semi-supervised classification methods for handling changes in class distribution. 897-906
Donghui Yan, Ling Huang, Michael I. Jordan:
Fast approximate spectral clustering. 907-916
Bishan Yang, Jian-Tao Sun, Tengjiao Wang, Zheng Chen:
Effective multi-label active learning for text classification. 917-926
Tianbao Yang, Rong Jin, Yun Chi, Shenghuo Zhu:
Combining link and content for community detection: a discriminative approach. 927-936
Limin Yao, David M. Mimno, Andrew McCallum:
Efficient methods for topic model inference on streaming document collections. 937-946
Lexiang Ye, Eamonn J. Keogh:
Time series shapelets: a new primitive for data mining. 947-956
Zhijun Yin, Rui Li, Qiaozhu Mei, Jiawei Han:
Exploring social tagging graph for web object classification. 957-966
Shinjae Yoo, Yiming Yang, Frank Lin, Il-Chul Moon:
Mining social networks for personalized email prioritization. 967-976
Chang Hun You, Lawrence B. Holder, Diane J. Cook:
Learning patterns in the dynamics of biological networks. 977-986
Xiangliang Zhang, Cyril Furtlehner, Julien Perez, Cécile Germain-Renaud, Michèle Sebag:
Toward autonomic grids: analyzing the job flow with affinity streaming. 987-996
Yuzhou Zhang, Jianyong Wang, Yi Wang, Lizhu Zhou:
Parallel community detection on large networks with propinquity dynamics. 997-1006
Elena Zheleva, Hossam Sharara, Lise Getoor:
Co-evolution of social and affiliation networks. 1007-1016
Lei Zheng, Shaojun Wang, Yan Liu, Chi-Hoon Lee:
Information theoretic regularization for semi-supervised boosting. 1017-1026
ErHeng Zhong, Wei Fan, Jing Peng, Kun Zhang, Jiangtao Ren, Deepak S. Turaga, Olivier Verscheure:
Cross domain distribution adaptation via kernel mapping. 1027-1036
Guangyu Zhu, Gilad Mishne:
Mining rich session context to improve web search. 1037-1046
Jun Zhu, Eric P. Xing, Bo Zhang:
Primal sparse Max-margin Markov networks. 1047-1056
Qiang Zhu, Xiaoyue Wang, Eamonn J. Keogh, Sang-Hee Lee:
Augmenting the generalized hough transform to enable the mining of petroglyphs. 1057-1066

Industrial track papers

Josh Attenberg, Sandeep Pandey, Torsten Suel:
Modeling and predicting user behavior in sponsored search. 1067-1076
Indrajit Bhattacharya, Shantanu Godbole, Ajay Gupta, Ashish Verma, Jeff Achtermann, Kevin English:
Enabling analysts in managed services for CRM analytics. 1077-1086
Ludmila Cherkasova, Kave Eshghi, Charles B. Morrey, Joseph Tucek, Alistair C. Veitch:
Applying syntactic similarity algorithms for enterprise information management. 1087-1096
Wei Chu, Seung-Taek Park, Todd Beaupre, Nitin Motgi, Amit Phadke, Seinjuti Chakraborty, Joe Zachariah:
A case study of behavior-driven conjoint analysis on Yahoo!: front page today module. 1097-1104
Thomas Crook, Brian Frasca, Ron Kohavi, Roger Longbotham:
Seven pitfalls to avoid when running controlled experiments on the web. 1105-1114
Srivatsava Daruru, Nena M. Marin, Matt Walker, Joydeep Ghosh:
Pervasive parallelism in data mining: dataflow solution to co-clustering large and sparse Netflix data. 1115-1124
Xiaowen Ding, Bing Liu, Lei Zhang:
Entity discovery and assignment for opinion mining applications. 1125-1134
Xiaoxi Du, Ruoming Jin, Liang Ding, Victor E. Lee, John H. Thornton Jr.:
Migration motif: a spatial - temporal pattern mining approach for financial markets. 1135-1144
Ariel Fuxman, Anitha Kannan, Andrew B. Goldberg, Rakesh Agrawal, Panayiotis Tsaparas, John C. Shafer:
Improving classification accuracy using automatically extracted training data. 1145-1154
Honglei Guo, Huijia Zhu, Zhili Guo, Xiaoxun Zhang, Zhong Su:
Address standardization with latent semantic association. 1155-1164
Sonal Gupta, Mikhail Bilenko, Matthew Richardson:
Catching the drift: learning broad matches from clickthrough data. 1165-1174
Mohammad Al Hasan, W. Scott Spangler, Thomas D. Griffin, Alfredo Alba:
COA: finding novel patents through text analysis. 1175-1184
Shunsuke Hirose, Kenji Yamanishi, Takayuki Nakata, Ryohei Fujimaki:
Network anomaly detection based on Eigen equation compression. 1185-1194
Wei Jin, Hung Hay Ho, Rohini K. Srihari:
OpinionMiner: a novel machine learning system for web opinion mining and extraction. 1195-1204
Jongwuk Lee, Seung-won Hwang, Zaiqing Nie, Ji-Rong Wen:
Query result clustering for object-level search. 1205-1214
Ming Li, M. Benjamin Dias, Ian H. Jarman, Wael El-Deredy, Paulo J. G. Lisboa:
Grocery shopping recommendations based on basket-sensitive random walk. 1215-1224
Yan Liu, Jayant R. Kalagnanam, Oivind Johnsen:
Learning dynamic temporal graphs for oil-production equipment monitoring system. 1225-1234
Ping Luo, Fen Lin, Yuhong Xiong, Yong Zhao, Zhongzhi Shi:
Towards combining web classification and web information extraction: a case study. 1235-1244
Justin Ma, Lawrence K. Saul, Stefan Savage, Geoffrey M. Voelker:
Beyond blacklists: learning to detect malicious web sites from suspicious URLs. 1245-1254
Adetokunbo Makanju, A. Nur Zincir-Heywood, Evangelos E. Milios:
Clustering event logs using iterative partitioning. 1255-1264
Mary McGlohon, Stephen Bay, Markus G. Anderle, David M. Steier, Christos Faloutsos:
SNARE: a link analytic system for graph labeling and risk detection. 1265-1274
Prem Melville, Wojciech Gryc, Richard D. Lawrence:
Sentiment analysis of blogs by combining lexical knowledge with text classification. 1275-1284
Noman Mohammed, Benjamin C. M. Fung, Patrick C. K. Hung, Cheuk-kwong Lee:
Anonymizing healthcare data: a case study on the blood transfusion service. 1285-1294
Kivanc M. Ozonat, Donald Young:
Towards a universal marketplace over the web: statistical multi-label classification of service provider forms with simulated annealing. 1295-1304
Debprakash Patnaik, Manish Marwah, Ratnesh K. Sharma, Naren Ramakrishnan:
Sustainable operation and management of data center chillers using temporal data mining. 1305-1314
B. Aditya Prakash, Nicholas Valler, David Andersen, Michalis Faloutsos, Christos Faloutsos:
BGP-lens: patterns and anomalies in internet routing updates. 1315-1324
D. Sculley, Robert G. Malkin, Sugato Basu, Roberto J. Bayardo:
Predicting bounce rates in sponsored search advertisements. 1325-1334
Liang Sun, Rinkal Patel, Jun Liu, Kewei Chen, Teresa Wu, Jing Li, Eric Reiman, Jieping Ye:
Mining brain region connectivity for alzheimer's disease study via sparse inverse covariance estimation. 1335-1344
Junfeng Wang, Chun Chen, Can Wang, Jian Pei, Jiajun Bu, Ziyu Guan, Wei Vivian Zhang:
Can we learn a template-independent wrapper for news article extraction from a single training site? 1345-1354
Kuansan Wang, Toby Walker, Zijian Zheng:
PSkip: estimating relevance ranking quality from web search clickthrough data. 1355-1364
Gu Xu, Shuang-Hong Yang, Hang Li:
Named entity mining from click-through data using weakly supervised latent dirichlet allocation. 1365-1374
Jiang-Ming Yang, Rui Cai, Chunsong Wang, Hua Huang, Lei Zhang, Wei-Ying Ma:
Incorporating site-level knowledge for incremental crawling of web forums: a list-wise strategy. 1375-1384
Yanfang Ye, Tao Li, Qingshan Jiang, Zhixue Han, Li Wan:
Intelligent file scoring system for malware detection from the gray list. 1385-1394
Bin Zhou, Daxin Jiang, Jian Pei, Hang Li:
OLAP on search logs: an infrastructure supporting data-driven applications in search engines. 1395-1404