15. KDD 2009:
Paris,
France
 John F. Elder IV, Françoise Fogelman-Soulié, Peter A. Flach, Mohammed Javeed Zaki (Eds.):
Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, June 28 - July 1, 2009.
 ACM 2009, ISBN 978-1-60558-495-9  
  
  
  
  
 
Keynote talks
 
- David J. Hand:
 Mismatched models, wrong results, and dreadful decisions: on choosing appropriate data mining tools.
1-2
             
- Ravi Kumar:
 Mining web logs: applications and challenges.
3-4
             
- Heikki Mannila:
 Randomization methods in data mining.
5-6
             
- Ashok N. Srivastava:
 Data mining at NASA: from theory to applications.
7-8
             
- Stanley Wasserman:
 Network science: an introduction to recent statistical approaches.
9-10
             
Panel
 
Research track papers
 
- Deepak Agarwal, Bee-Chung Chen:
 Regression-based latent factor models.
19-28
             
- Charu C. Aggarwal, Yan Li, Jianyong Wang, Jing Wang:
 Frequent pattern mining with uncertain data.
29-38
             
- Amr Ahmed, Eric P. Xing, William W. Cohen, Robert F. Murphy:
 Structured correspondence topic models for mining captioned figures in biological literature.
39-48
             
- Anurag Ambekar, Charles B. Ward, Jahangir Mohammed, Swapna Male, Steven Skiena:
 Name-ethnicity classification from open sources.
49-58
             
- Shin Ando, Einoshin Suzuki:
 Detection of unique temporal segments by information theoretic meta-clustering.
59-68
             
- Mafruz Zaman Ashrafi, See-Kiong Ng:
 Collusion-resistant anonymous data collection method.
69-78
             
- Sitaram Asur, Srinivasan Parthasarathy:
 A viewpoint-based approach for interaction graph analysis.
79-88
             
- Lars Backstrom, Jon M. Kleinberg, Ravi Kumar:
 Optimizing web traffic via the media scheduling problem.
89-98
             
- Ron Bekkerman, Martin Scholz, Krishnamurthy Viswanathan:
 Improving clustering stability with combinatorial MRFs.
99-108
             
- Michele Berlingerio, Fabio Pinelli, Mirco Nanni, Fosca Giannotti:
 Temporal mining for interactive workflow data analysis.
109-118
             
- Thomas Bernecker, Hans-Peter Kriegel, Matthias Renz, Florian Verhein, Andreas Züfle:
 Probabilistic frequent itemset mining in uncertain databases.
119-128
             
- Alina Beygelzimer, John Langford:
 The offset tree for learning with partial labels.
129-138
             
- Albert Bifet, Geoffrey Holmes, Bernhard Pfahringer, Richard Kirkby, Ricard Gavaldà:
 New ensemble methods for evolving data streams.
139-148
             
- Christian Böhm, Katrin Haegler, Nikola S. Müller, Claudia Plant:
 CoCo: coding cost for parameter-free outlier detection.
149-158
             
- Yingyi Bu, Lei Chen, Ada Wai-Chee Fu, Dawei Liu:
 Efficient anomaly monitoring over moving object trajectory streams.
159-168
             
- Jonathan Chang, Jordan L. Boyd-Graber, David M. Blei:
 Connections between the lines: augmenting social networks with text.
169-178
             
- Bo Chen, Wai Lam, Ivor Tsang, Tak-Lam Wong:
 Extracting discriminative concepts for domain adaptation in text mining.
179-188
             
- Minmin Chen, Yixin Chen, Michael R. Brent, Aaron E. Tenney:
 Constrained optimization for validation-guided conditional random field learning.
189-198
             
- Wei Chen, Yajun Wang, Siyu Yang:
 Efficient influence maximization in social networks.
199-208
             
- Ye Chen, Dmitry Pavlov, John F. Canny:
 Large-scale behavioral targeting.
209-218
             
- Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, Prabhakar Raghavan:
 On compressing social networks.
219-228
             
- Erick Delage:
 Regret-based online ranking for a growing digital library.
229-238
             
- Hongbo Deng, Michael R. Lyu, Irwin King:
 A generalized Co-HITS algorithm and its application to bipartite graphs.
239-248
             
- Meghana Deodhar, Joydeep Ghosh:
 Mining for the most certain predictions from dyadic data.
249-258
             
- Pinar Donmez, Jaime G. Carbonell, Jeff Schneider:
 Efficiently learning the accuracy of labeling sources for selective sampling.
259-268
             
- Nan Du, Christos Faloutsos, Bai Wang, Leman Akoglu:
 Large human communication networks: patterns and a utility-driven generator.
269-278
             
- Murat Dundar, E. Daniel Hirleman, Arun K. Bhunia, J. Paul Robinson, Bartek Rajwa:
 Learning with a non-exhaustive training dataset: a case study: detection of bacteria cultures using optical-scattering technology.
279-288
             
- Khalid El-Arini, Gaurav Veda, Dafna Shahaf, Carlos Guestrin:
 Turning down the noise in the blogosphere.
289-298
             
- George Forman, Martin Scholz, Shyamsundar Rajaram:
 Feature shaping for linear SVM classifiers.
299-308
             
- Richard Frank, Martin Ester, Arno Knobbe:
 A multi-relational approach to spatial classification.
309-318
             
- Antonino Freno, Edmondo Trentin, Marco Gori:
 Scalable pseudo-likelihood estimation in hybrid random fields.
319-328
             
- João Gama, Raquel Sebastião, Pedro Pereira Rodrigues:
 Issues in evaluation of stream learning algorithms.
329-338
             
- Jing Gao, Wei Fan, Yizhou Sun, Jiawei Han:
 Heterogeneous source consensus learning via decision propagation and negotiation.
339-348
             
- Yong Ge, Hui Xiong, Wenjun Zhou, Ramendra K. Sahoo, Xiaofeng Gao, Weili Wu:
 Multi-focal learning and its application to customer service support.
349-358
             
- Quanquan Gu, Jie Zhou:
 Co-clustering on manifolds.
359-368
             
- Lei Guo, Enhua Tan, Songqing Chen, Xiaodong Zhang, Yihong Eric Zhao:
 Analyzing patterns of user content generation in online social networks.
369-378
             
- Sami Hanhijärvi, Markus Ojala, Niko Vuokko, Kai Puolamäki, Nikolaj Tatti, Heikki Mannila:
 Tell me something I don't know: randomization strategies for iterative data mining.
379-388
             
- Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, Xiaohua Zhou:
 Exploiting Wikipedia as external knowledge for document clustering.
389-396
             
- Mohsen Jamali, Martin Ester:
 TrustWalker: a random walk model for combining trust-based and item-based recommendation.
397-406
             
- Shuiwang Ji, Lei Yuan, Ying-Xin Li, Zhi-Hua Zhou, Sudhir Kumar, Jieping Ye:
 Drosophila gene expression pattern annotation using sparse features and term-term interactions.
407-416
             
- Ruoming Jin, Yang Xiang, Lin Liu:
 Cartesian contour: a concise representation for a collection of frequent sets.
417-426
             
- Aleksander Kolcz, Gordon V. Cormack:
 Genre-based decomposition of email class noise.
427-436
             
- Arne Koopman, Arno Siebes:
 Characteristic relational patterns.
437-446
             
- Yehuda Koren:
 Collaborative filtering with temporal dynamics.
447-456
             
- Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, Soumen Chakrabarti:
 Collective annotation of Wikipedia entities in web text.
457-466
             
- Theodoros Lappas, Kun Liu, Evimaria Terzi:
 Finding a team of experts in social networks.
467-476
             
- Theodoros Lappas, Benjamin Arai, Manolis Platakis, Dimitrios Kotsakos, Dimitrios Gunopulos:
 On burstiness-aware search for document sequences.
477-486
             
- Mark Last:
 Improving data mining utility with projective sampling.
487-496
             
- Jure Leskovec, Lars Backstrom, Jon M. Kleinberg:
 Meme-tracking and the dynamics of the news cycle.
497-506
             
- Lei Li, James McCann, Nancy S. Pollard, Christos Faloutsos:
 DynaMMo: mining and summarization of coevolving sequences with missing values.
507-516
             
- Tiancheng Li, Ninghui Li:
 On the tradeoff between privacy and utility in data publishing.
517-526
             
- Yu-Ru Lin, Jimeng Sun, Paul Castro, Ravi B. Konuru, Hari Sundaram, Aisling Kelliher:
 MetaFac: community discovery via relational hypergraph factorization.
527-536
             
- Chao Liu, Fan Guo, Christos Faloutsos:
 BBM: bayesian browsing model from petabyte-scale data.
537-546
             
- Jun Liu, Jianhui Chen, Jieping Ye:
 Large-scale sparse logistic regression.
547-556
             
- David Lo, Hong Cheng, Jiawei Han, Siau-Cheng Khoo, Chengnian Sun:
 Classification of software behaviors for failure detection: a discriminative pattern mining approach.
557-566
             
- Steven Loscalzo, Lei Yu, Chris H. Q. Ding:
 Consensus group stable feature selection.
567-576
             
- Aurelie C. Lozano, Naoki Abe, Yan Liu, Saharon Rosset:
 Grouped graphical Granger modeling methods for temporal causal modeling.
577-586
             
- Aurelie C. Lozano, Hongfei Li, Alexandru Niculescu-Mizil, Yan Liu, Claudia Perlich, Jonathan R. M. Hosking, Naoki Abe:
 Spatial-temporal causal modeling for climate change attribution.
587-596
             
- Sofus A. Macskassy:
 Using graph-based metrics with empirical risk minimization to speed up active learning on networked data.
597-606
             
- R. Dean Malmgren, Jake M. Hofman, Luis A. N. Amaral, Duncan J. Watts:
 Characterizing individual communication patterns.
607-616
             
- Andreas Maunz, Christoph Helma, Stefan Kramer:
 Large-scale graph mining using backbone refinement classes.
617-626
             
- Frank McSherry, Ilya Mironov:
 Differentially private recommender systems: building privacy into the net.
627-636
             
- Anna Monreale, Fabio Pinelli, Roberto Trasarti, Fosca Giannotti:
 WhereNext: a location predictor on trajectory pattern mining.
637-646
             
- Siegfried Nijssen, Tias Guns, Luc De Raedt:
 Correlated itemset mining in ROC space: a constraint programming approach.
647-656
             
- Kensuke Onuma, Hanghang Tong, Christos Faloutsos:
 TANGENT: a novel, 'Surprise me', recommendation algorithm.
657-666
             
- Rong Pan, Martin Scholz:
 Mind the gaps: weighting the unknown in large-scale one-class collaborative filtering.
667-676
             
- Gaurav Pandey, Gowtham Atluri, Michael Steinbach, Chad L. Myers, Vipin Kumar:
 An association analysis approach to biclustering.
677-686
             
- Ardian Kristanto Poernomo, Vivekanand Gopalkrishnan:
 CP-summary: a concise representation for browsing frequent itemsets.
687-696
             
- Ardian Kristanto Poernomo, Vivekanand Gopalkrishnan:
 Towards efficient mining of proportional fault-tolerant frequent itemsets.
697-706
             
- Foster J. Provost, Brian Dalessandro, Rod Hook, Xiaohan Zhang, Alan Murray:
 Audience selection for on-line brand advertising: privacy-friendly social network targeting.
707-716
             
- Zijie Qi, Ian Davidson:
 A principled and flexible framework for finding alternative clusterings.
717-726
             
- Steffen Rendle, Leandro Balby Marinho, Alexandros Nanopoulos, Lars Schmidt-Thieme:
 Learning optimal ranking with tensor factorization for tag recommendation.
727-736
             
- Venu Satuluri, Srinivasan Parthasarathy:
 Scalable graph clustering using stochastic flows: applications to community discovery.
737-746
             
- Jerry Scripps, Pang-Ning Tan, Abdol-Hossein Esfahanian:
 Measuring the effects of preprocessing decisions and network forces in dynamic network analysis.
747-756
             
- Bao-Hong Shen, Shuiwang Ji, Jieping Ye:
 Mining discrete patterns via binary matrix factorization.
757-766
             
- Lei Shi, Vandana Pursnani Janeja:
 Anomalous window discovery through scan statistics for linear intersecting paths (SSLIP).
767-776
             
- Xiaolin Shi, Jun Zhu, Rui Cai, Lei Zhang:
 User grouping behavior in online forums.
777-786
             
- Takashi Shibuya, Tatsuya Harada, Yasuo Kuniyoshi:
 Causality quantification and its applications: structuring and modeling of multivariate time series.
787-796
             
- Yizhou Sun, Yintao Yu, Jiawei Han:
 Ranking-based clustering of heterogeneous information networks with star network schema.
797-806
             
- Jie Tang, Jimeng Sun, Chi Wang, Zi Yang:
 Social influence analysis in large-scale networks.
807-816
             
- Lei Tang, Huan Liu:
 Relational learning via latent social dimensions.
817-826
             
- Chayant Tantipathananandh, Tanya Y. Berger-Wolf:
 Constant-factor approximation algorithms for identifying dynamic communities.
827-836
             
- Charalampos E. Tsourakakis, U. Kang, Gary L. Miller, Christos Faloutsos:
 DOULION: counting triangles in massive graphs with a coin.
837-846
             
- Pavan Vatturi, Weng-Keen Wong:
 Category detection using hierarchical mean shift.
847-856
             
- Ting Wang, Mudhakar Srivatsa, Dakshi Agrawal, Ling Liu:
 Learning, indexing, and diagnosing network faults.
857-866
             
- Xuanhui Wang, Deepayan Chakrabarti, Kunal Punera:
 Mining broad latent query aspects from search sessions.
867-876
             
- Junjie Wu, Hui Xiong, Jian Chen:
 Adapting the right measures for K-means clustering.
877-886
             
- Mingxi Wu, Xiuyao Song, Chris Jermaine, Sanjay Ranka, John Gums:
 A LRT framework for fast spatial anomaly detection.
887-896
             
- Jack Chongjie Xue, Gary M. Weiss:
 Quantification and semi-supervised classification methods for handling changes in class distribution.
897-906
             
- Donghui Yan, Ling Huang, Michael I. Jordan:
 Fast approximate spectral clustering.
907-916
             
- Bishan Yang, Jian-Tao Sun, Tengjiao Wang, Zheng Chen:
 Effective multi-label active learning for text classification.
917-926
             
- Tianbao Yang, Rong Jin, Yun Chi, Shenghuo Zhu:
 Combining link and content for community detection: a discriminative approach.
927-936
             
- Limin Yao, David M. Mimno, Andrew McCallum:
 Efficient methods for topic model inference on streaming document collections.
937-946
             
- Lexiang Ye, Eamonn J. Keogh:
 Time series shapelets: a new primitive for data mining.
947-956
             
- Zhijun Yin, Rui Li, Qiaozhu Mei, Jiawei Han:
 Exploring social tagging graph for web object classification.
957-966
             
- Shinjae Yoo, Yiming Yang, Frank Lin, Il-Chul Moon:
 Mining social networks for personalized email prioritization.
967-976
             
- Chang Hun You, Lawrence B. Holder, Diane J. Cook:
 Learning patterns in the dynamics of biological networks.
977-986
             
- Xiangliang Zhang, Cyril Furtlehner, Julien Perez, Cécile Germain-Renaud, Michèle Sebag:
 Toward autonomic grids: analyzing the job flow with affinity streaming.
987-996
             
- Yuzhou Zhang, Jianyong Wang, Yi Wang, Lizhu Zhou:
 Parallel community detection on large networks with propinquity dynamics.
997-1006
             
- Elena Zheleva, Hossam Sharara, Lise Getoor:
 Co-evolution of social and affiliation networks.
1007-1016
             
- Lei Zheng, Shaojun Wang, Yan Liu, Chi-Hoon Lee:
 Information theoretic regularization for semi-supervised boosting.
1017-1026
             
- ErHeng Zhong, Wei Fan, Jing Peng, Kun Zhang, Jiangtao Ren, Deepak S. Turaga, Olivier Verscheure:
 Cross domain distribution adaptation via kernel mapping.
1027-1036
             
- Guangyu Zhu, Gilad Mishne:
 Mining rich session context to improve web search.
1037-1046
             
- Jun Zhu, Eric P. Xing, Bo Zhang:
 Primal sparse Max-margin Markov networks.
1047-1056
             
- Qiang Zhu, Xiaoyue Wang, Eamonn J. Keogh, Sang-Hee Lee:
 Augmenting the generalized hough transform to enable the mining of petroglyphs.
1057-1066
             
Industrial track papers
 
- Josh Attenberg, Sandeep Pandey, Torsten Suel:
 Modeling and predicting user behavior in sponsored search.
1067-1076
             
- Indrajit Bhattacharya, Shantanu Godbole, Ajay Gupta, Ashish Verma, Jeff Achtermann, Kevin English:
 Enabling analysts in managed services for CRM analytics.
1077-1086
             
- Ludmila Cherkasova, Kave Eshghi, Charles B. Morrey, Joseph Tucek, Alistair C. Veitch:
 Applying syntactic similarity algorithms for enterprise information management.
1087-1096
             
- Wei Chu, Seung-Taek Park, Todd Beaupre, Nitin Motgi, Amit Phadke, Seinjuti Chakraborty, Joe Zachariah:
 A case study of behavior-driven conjoint analysis on Yahoo!: front page today module.
1097-1104
             
- Thomas Crook, Brian Frasca, Ron Kohavi, Roger Longbotham:
 Seven pitfalls to avoid when running controlled experiments on the web.
1105-1114
             
- Srivatsava Daruru, Nena M. Marin, Matt Walker, Joydeep Ghosh:
 Pervasive parallelism in data mining: dataflow solution to co-clustering large and sparse Netflix data.
1115-1124
             
- Xiaowen Ding, Bing Liu, Lei Zhang:
 Entity discovery and assignment for opinion mining applications.
1125-1134
             
- Xiaoxi Du, Ruoming Jin, Liang Ding, Victor E. Lee, John H. Thornton Jr.:
 Migration motif: a spatial - temporal pattern mining approach for financial markets.
1135-1144
             
- Ariel Fuxman, Anitha Kannan, Andrew B. Goldberg, Rakesh Agrawal, Panayiotis Tsaparas, John C. Shafer:
 Improving classification accuracy using automatically extracted training data.
1145-1154
             
- Honglei Guo, Huijia Zhu, Zhili Guo, Xiaoxun Zhang, Zhong Su:
 Address standardization with latent semantic association.
1155-1164
             
- Sonal Gupta, Mikhail Bilenko, Matthew Richardson:
 Catching the drift: learning broad matches from clickthrough data.
1165-1174
             
- Mohammad Al Hasan, W. Scott Spangler, Thomas D. Griffin, Alfredo Alba:
 COA: finding novel patents through text analysis.
1175-1184
             
- Shunsuke Hirose, Kenji Yamanishi, Takayuki Nakata, Ryohei Fujimaki:
 Network anomaly detection based on Eigen equation compression.
1185-1194
             
- Wei Jin, Hung Hay Ho, Rohini K. Srihari:
 OpinionMiner: a novel machine learning system for web opinion mining and extraction.
1195-1204
             
- Jongwuk Lee, Seung-won Hwang, Zaiqing Nie, Ji-Rong Wen:
 Query result clustering for object-level search.
1205-1214
             
- Ming Li, M. Benjamin Dias, Ian H. Jarman, Wael El-Deredy, Paulo J. G. Lisboa:
 Grocery shopping recommendations based on basket-sensitive random walk.
1215-1224
             
- Yan Liu, Jayant R. Kalagnanam, Oivind Johnsen:
 Learning dynamic temporal graphs for oil-production equipment monitoring system.
1225-1234
             
- Ping Luo, Fen Lin, Yuhong Xiong, Yong Zhao, Zhongzhi Shi:
 Towards combining web classification and web information extraction: a case study.
1235-1244
             
- Justin Ma, Lawrence K. Saul, Stefan Savage, Geoffrey M. Voelker:
 Beyond blacklists: learning to detect malicious web sites from suspicious URLs.
1245-1254
             
- Adetokunbo Makanju, A. Nur Zincir-Heywood, Evangelos E. Milios:
 Clustering event logs using iterative partitioning.
1255-1264
             
- Mary McGlohon, Stephen Bay, Markus G. Anderle, David M. Steier, Christos Faloutsos:
 SNARE: a link analytic system for graph labeling and risk detection.
1265-1274
             
- Prem Melville, Wojciech Gryc, Richard D. Lawrence:
 Sentiment analysis of blogs by combining lexical knowledge with text classification.
1275-1284
             
- Noman Mohammed, Benjamin C. M. Fung, Patrick C. K. Hung, Cheuk-kwong Lee:
 Anonymizing healthcare data: a case study on the blood transfusion service.
1285-1294
             
- Kivanc M. Ozonat, Donald Young:
 Towards a universal marketplace over the web: statistical multi-label classification of service provider forms with simulated annealing.
1295-1304
             
- Debprakash Patnaik, Manish Marwah, Ratnesh K. Sharma, Naren Ramakrishnan:
 Sustainable operation and management of data center chillers using temporal data mining.
1305-1314
             
- B. Aditya Prakash, Nicholas Valler, David Andersen, Michalis Faloutsos, Christos Faloutsos:
 BGP-lens: patterns and anomalies in internet routing updates.
1315-1324
             
- D. Sculley, Robert G. Malkin, Sugato Basu, Roberto J. Bayardo:
 Predicting bounce rates in sponsored search advertisements.
1325-1334
             
- Liang Sun, Rinkal Patel, Jun Liu, Kewei Chen, Teresa Wu, Jing Li, Eric Reiman, Jieping Ye:
 Mining brain region connectivity for alzheimer's disease study via sparse inverse covariance estimation.
1335-1344
             
- Junfeng Wang, Chun Chen, Can Wang, Jian Pei, Jiajun Bu, Ziyu Guan, Wei Vivian Zhang:
 Can we learn a template-independent wrapper for news article extraction from a single training site?
1345-1354
             
- Kuansan Wang, Toby Walker, Zijian Zheng:
 PSkip: estimating relevance ranking quality from web search clickthrough data.
1355-1364
             
- Gu Xu, Shuang-Hong Yang, Hang Li:
 Named entity mining from click-through data using weakly supervised latent dirichlet allocation.
1365-1374
             
- Jiang-Ming Yang, Rui Cai, Chunsong Wang, Hua Huang, Lei Zhang, Wei-Ying Ma:
 Incorporating site-level knowledge for incremental crawling of web forums: a list-wise strategy.
1375-1384
             
- Yanfang Ye, Tao Li, Qingshan Jiang, Zhixue Han, Li Wan:
 Intelligent file scoring system for malware detection from the gray list.
1385-1394
             
- Bin Zhou, Daxin Jiang, Jian Pei, Hang Li:
 OLAP on search logs: an infrastructure supporting data-driven applications in search engines.
1395-1404
             
Copyright © Fri Mar 12 17:18:02 2010
 by Michael Ley (ley@uni-trier.de)