2. AND 2008: Singapore (SIGIR Workshop)

Daniel P. Lopresti, Shourya Roy, Klaus U. Schulz, L. Venkata Subramaniam (Eds.): Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data, AND 2008, Singapore, July 24, 2008. ACM International Conference Proceeding Series 303 ACM 2008, ISBN 978-1-60558-196-5

Donna Harman:
Some thoughts on failure analysis for noisy data.
John Tait:
Noise and information.
Laurianne Sitbon, Patrice Bellot:
How to cope with questions typed by dyslexic users. 1-8
Daniel P. Lopresti:
Optical character recognition errors and their effects on natural language processing. 9-16
Ulrich Reffle, Annette Gotscharek, Christoph Ringlstetter, Klaus U. Schulz:
Successfully detecting and correcting false friends using channel profiles. 17-22
Valentin Jijkoun, Mahboob Alam Khalid, Maarten Marx, Maarten de Rijke:
Named entity normalization in user generated content. 23-30
Rema Ananthanarayanan, Vijil Chenthamarakshan, Prasad M. Deshpande, Raghuram Krishnapuram:
Rule based synonyms for entity extraction from noisy text. 31-38
Jiyin He, Wouter Weerkamp, Martha Larson, Maarten de Rijke:
Blogger, stick to your story: modeling topical noise in blogs with coherence measures. 39-46
Robert McArthur:
Uncovering deep user context from blogs. 47-54
Jinfeng Zhuang, Steven C. H. Hoi, Aixin Sun:
On profiling blogs with representative entries. 55-62
Soumya Datta, Sudeshna Sarkar:
A comparative study of statistical features of language in blogs-vs-splogs. 63-66
Sreangsu Acharyya, Sumit Negi, L. Venkata Subramaniam, Shourya Roy:
Unsupervised learning of multilingual short message service (SMS) dialect from noisy examples. 67-74
Antti Järvelin, Tuomas Talvensaari, Anni Järvelin:
Data driven methods for improving mono- and cross-lingual IR performance in noisy environments. 75-82
Lipika Dey, S. K. Mirajul Haque:
Opinion mining from noisy text data. 83-90
Rachit Arora, Balaraman Ravindran:
Latent dirichlet allocation based multi-document summarization. 91-97
Amaresh Kumar Pandey, Tanveer J. Siddiqui:
An unsupervised Hindi stemmer with heuristic improvements. 99-105
Anurag Bhardwaj, Faisal Farooq, Huaigu Cao, Venu Govindaraju:
Topic based language models for OCR correction. 107-112
Eiman Al-Shammari, Jessica Lin:
A novel Arabic lemmatization algorithm. 113-118