Type Classification of Semi-Structured Documents.
Markus Tresch, Neal Palmer, Allen Luniewski:
Type Classification of Semi-Structured Documents.
VLDB 1995: 263-274@inproceedings{DBLP:conf/vldb/TreschPL95,
author = {Markus Tresch and
Neal Palmer and
Allen Luniewski},
editor = {Umeshwar Dayal and
Peter M. D. Gray and
Shojiro Nishio},
title = {Type Classification of Semi-Structured Documents},
booktitle = {VLDB'95, Proceedings of 21th International Conference on Very
Large Data Bases, September 11-15, 1995, Zurich, Switzerland},
publisher = {Morgan Kaufmann},
year = {1995},
isbn = {1-55860-379-4},
pages = {263-274},
ee = {db/conf/vldb/TreschPL95.html},
crossref = {DBLP:conf/vldb/95},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
Semi-structured documents (e.g. journal articles, electronic mail, television programs, mail order catalogs, ...) are often not explicitly typed; the only available type information is the implicit structure. An explicit type, however, is needed in order to apply object- oriented technology, like type-specific methods.
In this paper, we present an experimental vector space classifier for determining the type of semi-structured documents. Our goal was to design a high-performance classifier in terms of accuracy (recall and precision), speed, and extensibility.
Copyright © 1995 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Umeshwar Dayal, Peter M. D. Gray, Shojiro Nishio (Eds.):
VLDB'95, Proceedings of 21th International Conference on Very Large Data Bases, September 11-15, 1995, Zurich, Switzerland.
Morgan Kaufmann 1995, ISBN 1-55860-379-4
Contents
References
- [BFOS84]
- Leo Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone:
Classification and Regression Trees.
Wadsworth 1984, ISBN 0-534-98053-8
- [CACS94]
- Vassilis Christophides, Serge Abiteboul, Sophie Cluet, Michel Scholl:
From Structured Documents to Novel Query Facilities.
SIGMOD Conference 1994: 313-324
- [CM94]
- Mariano P. Consens, Tova Milo:
Optimizing Queries on Files.
SIGMOD Conference 1994: 301-312
- [GRW84]
- ...
- [Hoc94]
- Rainer Hoch:
Using IR Techniques for Text Classification in Document Analysis.
SIGIR 1994: 31-40
- [Hon94]
- ...
- [HS93]
- ...
- [Jam85]
- Mike James:
Classification Algorithms.
John Wiley 1985, ISBN 0-471-84799-2
- [Jon71]
- ...
- [LG94]
- David D. Lewis, William A. Gale:
A Sequential Algorithm for Training Text Classifiers.
SIGIR 1994: 3-12
- [ODL93]
- Katia Obraczka, Peter B. Danzig, Shih-Hao Li:
Internet Resource Discovery Services.
IEEE Computer 26(9): 8-22(1993)
- [Qui93]
- J. Ross Quinlan:
C4.5: Programs for Machine Learning.
Morgan Kaufmann 1993, ISBN 1-55860-238-0
- [Sal89]
- Gerard Salton:
Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer.
Addison-Wesley 1989, ISBN 0-201-12227-8
- [Sch93]
- Peter Schäuble:
SPIDER: A Multiuser Information Retrieval System for Semistructured and Dynamic Data.
SIGIR 1993: 318-327
- [SIG94a]
- ...
- [SIG94b]
- Richard T. Snodgrass, Marianne Winslett (Eds.):
Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 24-27, 1994.
ACM Press 1994
Contents - [SLS+93]
- Kurt A. Shoens, Allen Luniewski, Peter M. Schwarz, James W. Stamos, Joachim Thomas II:
The Rufus System: Information Organization for Semi-Structured Data.
VLDB 1993: 97-107
- [SWY75]
- Gerard Salton, A. Wong, C. S. Yang:
A Vector Space Model for Automatic Indexing.
Commun. ACM 18(11): 613-620(1975)
- [vR79]
- C. J. van Rijsbergen:
Information Retrieval.
Butterworth 1979, ISBN 0-408-70929-4
- [YMP89]
- Clement T. Yu, Weiyi Meng, S. Park:
A Framework for Effective Retrieval.
ACM Trans. Database Syst. 14(2): 147-167(1989)
Copyright © Tue Mar 16 02:22:05 2010
by Michael Ley (ley@uni-trier.de)