2. AND 2008:
Singapore (SIGIR Workshop)
Daniel P. Lopresti, Shourya Roy, Klaus U. Schulz, L. Venkata Subramaniam (Eds.):
Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data, AND 2008, Singapore, July 24, 2008.
ACM International Conference Proceeding Series 303 ACM 2008, ISBN 978-1-60558-196-5
- Donna Harman:
Some thoughts on failure analysis for noisy data.
- John Tait:
Noise and information.
- Laurianne Sitbon, Patrice Bellot:
How to cope with questions typed by dyslexic users.
1-8
- Daniel P. Lopresti:
Optical character recognition errors and their effects on natural language processing.
9-16
- Ulrich Reffle, Annette Gotscharek, Christoph Ringlstetter, Klaus U. Schulz:
Successfully detecting and correcting false friends using channel profiles.
17-22
- Valentin Jijkoun, Mahboob Alam Khalid, Maarten Marx, Maarten de Rijke:
Named entity normalization in user generated content.
23-30
- Rema Ananthanarayanan, Vijil Chenthamarakshan, Prasad M. Deshpande, Raghuram Krishnapuram:
Rule based synonyms for entity extraction from noisy text.
31-38
- Jiyin He, Wouter Weerkamp, Martha Larson, Maarten de Rijke:
Blogger, stick to your story: modeling topical noise in blogs with coherence measures.
39-46
- Robert McArthur:
Uncovering deep user context from blogs.
47-54
- Jinfeng Zhuang, Steven C. H. Hoi, Aixin Sun:
On profiling blogs with representative entries.
55-62
- Soumya Datta, Sudeshna Sarkar:
A comparative study of statistical features of language in blogs-vs-splogs.
63-66
- Sreangsu Acharyya, Sumit Negi, L. Venkata Subramaniam, Shourya Roy:
Unsupervised learning of multilingual short message service (SMS) dialect from noisy examples.
67-74
- Antti Järvelin, Tuomas Talvensaari, Anni Järvelin:
Data driven methods for improving mono- and cross-lingual IR performance in noisy environments.
75-82
- Lipika Dey, S. K. Mirajul Haque:
Opinion mining from noisy text data.
83-90
- Rachit Arora, Balaraman Ravindran:
Latent dirichlet allocation based multi-document summarization.
91-97
- Amaresh Kumar Pandey, Tanveer J. Siddiqui:
An unsupervised Hindi stemmer with heuristic improvements.
99-105
- Anurag Bhardwaj, Faisal Farooq, Huaigu Cao, Venu Govindaraju:
Topic based language models for OCR correction.
107-112
- Eiman Al-Shammari, Jessica Lin:
A novel Arabic lemmatization algorithm.
113-118
Copyright © Mon Mar 15 03:54:22 2010
by Michael Ley (ley@uni-trier.de)