ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Type Classification of Semi-Structured Documents.

Markus Tresch, Neal Palmer, Allen Luniewski: Type Classification of Semi-Structured Documents. VLDB 1995: 263-274
@inproceedings{DBLP:conf/vldb/TreschPL95,
  author    = {Markus Tresch and
               Neal Palmer and
               Allen Luniewski},
  editor    = {Umeshwar Dayal and
               Peter M. D. Gray and
               Shojiro Nishio},
  title     = {Type Classification of Semi-Structured Documents},
  booktitle = {VLDB'95, Proceedings of 21th International Conference on Very
               Large Data Bases, September 11-15, 1995, Zurich, Switzerland},
  publisher = {Morgan Kaufmann},
  year      = {1995},
  isbn      = {1-55860-379-4},
  pages     = {263-274},
  ee        = {db/conf/vldb/TreschPL95.html},
  crossref  = {DBLP:conf/vldb/95},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Semi-structured documents (e.g. journal articles, electronic mail, television programs, mail order catalogs, ...) are often not explicitly typed; the only available type information is the implicit structure. An explicit type, however, is needed in order to apply object- oriented technology, like type-specific methods.

In this paper, we present an experimental vector space classifier for determining the type of semi-structured documents. Our goal was to design a high-performance classifier in terms of accuracy (recall and precision), speed, and extensibility.

Copyright © 1995 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Umeshwar Dayal, Peter M. D. Gray, Shojiro Nishio (Eds.): VLDB'95, Proceedings of 21th International Conference on Very Large Data Bases, September 11-15, 1995, Zurich, Switzerland. Morgan Kaufmann 1995, ISBN 1-55860-379-4
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

References

[BFOS84]
Leo Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone: Classification and Regression Trees. Wadsworth 1984, ISBN 0-534-98053-8
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[CACS94]
Vassilis Christophides, Serge Abiteboul, Sophie Cluet, Michel Scholl: From Structured Documents to Novel Query Facilities. SIGMOD Conference 1994: 313-324 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[CM94]
Mariano P. Consens, Tova Milo: Optimizing Queries on Files. SIGMOD Conference 1994: 301-312 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[GRW84]
...
[Hoc94]
Rainer Hoch: Using IR Techniques for Text Classification in Document Analysis. SIGIR 1994: 31-40 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Hon94]
...
[HS93]
...
[Jam85]
Mike James: Classification Algorithms. John Wiley 1985, ISBN 0-471-84799-2
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Jon71]
...
[LG94]
David D. Lewis, William A. Gale: A Sequential Algorithm for Training Text Classifiers. SIGIR 1994: 3-12 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[ODL93]
Katia Obraczka, Peter B. Danzig, Shih-Hao Li: Internet Resource Discovery Services. IEEE Computer 26(9): 8-22(1993) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Qui93]
J. Ross Quinlan: C4.5: Programs for Machine Learning. Morgan Kaufmann 1993, ISBN 1-55860-238-0
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Sal89]
Gerard Salton: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley 1989, ISBN 0-201-12227-8
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Sch93]
Peter Schäuble: SPIDER: A Multiuser Information Retrieval System for Semistructured and Dynamic Data. SIGIR 1993: 318-327 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SIG94a]
...
[SIG94b]
Richard T. Snodgrass, Marianne Winslett (Eds.): Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 24-27, 1994. ACM Press 1994
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SLS+93]
Kurt A. Shoens, Allen Luniewski, Peter M. Schwarz, James W. Stamos, Joachim Thomas II: The Rufus System: Information Organization for Semi-Structured Data. VLDB 1993: 97-107 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SWY75]
Gerard Salton, A. Wong, C. S. Yang: A Vector Space Model for Automatic Indexing. Commun. ACM 18(11): 613-620(1975) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[vR79]
C. J. van Rijsbergen: Information Retrieval. Butterworth 1979, ISBN 0-408-70929-4
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[YMP89]
Clement T. Yu, Weiyi Meng, S. Park: A Framework for Effective Retrieval. ACM Trans. Database Syst. 14(2): 147-167(1989) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Tue Mar 16 02:22:05 2010 by Michael Ley (ley@uni-trier.de)