Determining Text Databases to Search in the Internet.
Weiyi Meng, King-Lup Liu, Clement T. Yu, Xiaodong Wang, Yuhsi Chang, Naphtali Rishe:
Determining Text Databases to Search in the Internet.
VLDB 1998: 14-25@inproceedings{DBLP:conf/vldb/MengLYWCR98,
author = {Weiyi Meng and
King-Lup Liu and
Clement T. Yu and
Xiaodong Wang and
Yuhsi Chang and
Naphtali Rishe},
editor = {Ashish Gupta and
Oded Shmueli and
Jennifer Widom},
title = {Determining Text Databases to Search in the Internet},
booktitle = {VLDB'98, Proceedings of 24rd International Conference on Very
Large Data Bases, August 24-27, 1998, New York City, New York,
USA},
publisher = {Morgan Kaufmann},
year = {1998},
isbn = {1-55860-566-5},
pages = {14-25},
ee = {db/conf/vldb/MengLYWCR98.html},
crossref = {DBLP:conf/vldb/98},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
Text data in the Internet can be partitioned into many databases naturally. Efficient retrieval of desired data can be achieved if we can accuratelypredict the usefulness of each database, because with such information, weonly need to retrieve potentially useful documents from useful databases. In this paper, we propose two new methods for estimating the usefulness oftext databases. For a given query, the usefulness of a text database in this paper is defined to be the number of documents in the database that aresufficiently similar to the query. Such a usefulness measure enables naive-users to make informed decision about which databases to search. We also consider the collection fusion problem. Because local databases may employsimilarity functions that are different from that used by the global database, the threshold used by a local database to determine whether a document is potentially useful may be different from that used by the global database. We provide techniques that determine the best threshold for a given local database.
Copyright © 1998 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Ashish Gupta, Oded Shmueli, Jennifer Widom (Eds.):
VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA.
Morgan Kaufmann 1998, ISBN 1-55860-566-5
Contents
References
- [ALSF97]
- ...
- [BuSA93]
- ...
- [CLBC95]
- James P. Callan, Zhihong Lu, W. Bruce Croft:
Searching Distributed Collections with Inference Networks.
SIGIR 1995: 21-28
- [DuHa73]
- ...
- [Gass69]
- ...
- [GrGM95a]
- Luis Gravano, Hector Garcia-Molina:
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies.
VLDB 1995: 78-89
- [GrGM95b]
- ...
- [GrGM97]
- Luis Gravano, Hector Garcia-Molina:
Merging Ranks from Heterogeneous Internet Sources.
VLDB 1997: 196-205
- [Harm93]
- ...
- [HoDr97]
- ...
- [KaMe91]
- ...
- [Kost94]
- Martijn Koster:
ALIWEB - Archie-like Indexing in the WEB.
Computer Networks and ISDN Systems 27(2): 175-182(1994)
- [Kow97]
- ...
- [LaYu82]
- K. Lam, Clement T. Yu:
A Clustered Search Algorithm Incorporating Arbitrary Term Dependencies.
ACM Trans. Database Syst. 7(3): 500-508(1982)
- [MaBi97]
- ...
- [MLYW98]
- ...
- [NCS]
- ...
- [SaMc83]
- Gerard Salton, Michael McGill:
Introduction to Modern Information Retrieval.
McGraw-Hill Book Company 1984, ISBN 0-07-054484-0
- [Salt89]
- Gerard Salton:
Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer.
Addison-Wesley 1989, ISBN 0-201-12227-8
- [SeEt95]
- ...
- [SeEt97]
- ...
- [TVGJ95]
- ...
- [VGJL95]
- Ellen M. Voorhees, Narendra Kumar Gupta, Ben Johnson-Laird:
Learning Collection Fusion Strategies.
SIGIR 1995: 172-179
- [Widd89]
- ...
- [YaGM95]
- Tak W. Yan, Hector Garcia-Molina:
SIFT - a Tool for Wide-Area Information Dissemination.
USENIX Winter 1995: 177-186
- [YuLS78]
- Clement T. Yu, W. S. Luk, M. K. Siu:
On the Estimation of the Number of Desired Records with Respect to a Given Query.
ACM Trans. Database Syst. 3(1): 41-56(1978)
- [YuLe97]
- Budi Yuwono, Dik Lun Lee:
Server Ranking for Distributed Text Retrieval Systems on the Internet.
DASFAA 1997: 41-50
Copyright © Tue Mar 16 02:22:07 2010
by Michael Ley (ley@uni-trier.de)