Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results.
Jason Tsong-Li Wang, Gung-Wei Chirn, Thomas G. Marr, Bruce A. Shapiro, Dennis Shasha, Kaizhong Zhang:
Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results.
SIGMOD Conference 1994: 115-125@inproceedings{DBLP:conf/sigmod/WangCMSSZ94,
author = {Jason Tsong-Li Wang and
Gung-Wei Chirn and
Thomas G. Marr and
Bruce A. Shapiro and
Dennis Shasha and
Kaizhong Zhang},
editor = {Richard T. Snodgrass and
Marianne Winslett},
title = {Combinatorial Pattern Discovery for Scientific Data: Some Preliminary
Results},
booktitle = {Proceedings of the 1994 ACM SIGMOD International Conference on
Management of Data, Minneapolis, Minnesota, May 24-27, 1994},
publisher = {ACM Press},
year = {1994},
pages = {115-125},
ee = {http://doi.acm.org/10.1145/191839.191863, db/conf/sigmod/WangCMSSZ94.html},
crossref = {DBLP:conf/sigmod/94},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
Suppose you are given a set of natural entities
(e.g., proteins, organisms, weather patterns, etc.)
that possess some important common externally observable properties.
You also have
a structural description of the entities
(e.g., sequence, topological, or geometrical data) and a distance metric.
Combinatorial pattern discovery is the activity of finding
patterns in the structural data
that might explain these common properties based on the metric.
This paper presents an example of combinatorial pattern discovery:
the discovery of patterns in protein databases.
The structural representation we consider are strings
and the distance metric is string edit distance
permitting variable length don't cares.
Our techniques incorporate string matching algorithms
and novel heuristics for discovery and optimization, most
of which generalize to other combinatorial structures.
Experimental results of applying the techniques
to both generated data and functionally related protein families obtained
from the Cold Spring Harbor Laboratory show the
effectiveness of the proposed techniques.
When we apply the discovered patterns to perform protein classification,
they give information that is complementary to the best
protein classifier available today.
Copyright © 1994 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
Online Version (ACM WWW Account required): Full Text in PDF Format
CDROM Version: Load the CDROM "Volume 1 Issue 1, SIGMOD '93-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Richard T. Snodgrass, Marianne Winslett (Eds.):
Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 24-27, 1994.
ACM Press 1994 ,
SIGMOD Record 23(2),
June 1994
Contents
[Abstract and Index Terms]
[Full Text in PDF Format, 1016 KB]
References
- [1]
- Rakesh Agrawal, Sakti P. Ghosh, Tomasz Imielinski, Balakrishna R. Iyer, Arun N. Swami:
An Interval Classifier for Database Mining Applications.
VLDB 1992: 560-573
- [2]
- Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami:
Mining Association Rules between Sets of Items in Large Databases.
SIGMOD Conference 1993: 207-216
- [3]
- ...
- [4]
- ...
- [5]
- ...
- [6]
- ...
- [7]
- ...
- [8]
- ...
- [9]
- Vasant Dhar, Alexander Tuzhilin:
Abstract-Driven Pattern Discovery in Databases.
IEEE Trans. Knowl. Data Eng. 5(6): 926-938(1993)
- [10]
- William J. Frawley, Gregory Piatetsky-Shapiro, Christopher J. Matheus:
Knowledge Discovery in Databases: An Overview.
Knowledge Discovery in Databases 1991: 1-30
- [11]
- Karen A. Frenkel:
The Human Genome Project and Informatics.
Commun. ACM 34(11): 40-51(1991)
- [12]
- ...
- [13]
- Peter J. Haas, Arun N. Swami:
Sequential Sampling Procedures for Query Size Estimation.
SIGMOD Conference 1992: 341-350
- [14]
- Jiawei Han, Yandong Cai, Nick Cercone:
Knowledge Discovery in Databases: An Attribute-Oriented Approach.
VLDB 1992: 547-559
- [15]
- ...
- [16]
- Wen-Chi Hou, Gultekin Özsoyoglu:
Statistical Estimators for Aggregate Relational Algebra Queries.
ACM Trans. Database Syst. 16(4): 600-654(1991)
- [17]
- Lucas Chi Kwong Hui:
Color Set Size Problem with Application to String Matching.
CPM 1992: 230-243
- [18]
- ...
- [19]
- ...
- [20]
- Gad M. Landau, Uzi Vishkin:
Fast Parallel and Serial Approximate String Matching.
J. Algorithms 10(2): 157-169(1989)
- [21]
- ...
- [22]
- Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider:
Practical Selectivity Estimation through Adaptive Sampling.
SIGMOD Conference 1990: 1-11
- [23]
- Edward M. McCreight:
A Space-Economical Suffix Tree Construction Algorithm.
J. ACM 23(2): 262-272(1976)
- [24]
- Donald R. Morrison:
PATRICIA - Practical Algorithm To Retrieve Information Coded in Alphanumeric.
J. ACM 15(4): 514-534(1968)
- [25]
- Frank Olken, Doron Rotem:
Random Sampling from B+ Trees.
VLDB 1989: 269-277
- [26]
- ...
- [27]
- ...
- [28]
- ...
- [29]
- ...
- [30]
- Dennis Shasha, Jason Tsong-Li Wang:
New Techniques for Best-Match Retrieval.
ACM Trans. Inf. Syst. 8(2): 140-158(1990)
- [31]
- Abraham Silberschatz, Michael Stonebraker, Jeffrey D. Ullman:
Database Systems: Achievements and Opportunities.
Commun. ACM 34(10): 110-120(1991)
- [32]
- Bharathi Subramanian, Stanley B. Zdonik, Theodore W. Leung, Scott L. Vandenberg:
Ordered Types in the AQUA Data Model.
DBPL 1993: 115-135
- [33]
- Shalom Tsur:
Data Dredging.
IEEE Data Eng. Bull. 13(4): 58-63(1990)
- [34]
- ...
- [35]
- Robert A. Wagner, Michael J. Fischer:
The String-to-String Correction Problem.
J. ACM 21(1): 168-173(1974)
- [36]
- Jason Tsong-Li Wang, Kaizhong Zhang, Karpjoo Jeong, Dennis Shasha:
A System for Approximate Tree Matching.
IEEE Trans. Knowl. Data Eng. 6(4): 559-571(1994)
- [37]
- Jason Tsong-Li Wang, Dennis Shasha:
Query Processing for Distance Metrics.
VLDB 1990: 602-613
- [38]
- ...
- [39]
- Sun Wu, Udi Manber:
Fast Text Searching Allowing Errors.
Commun. ACM 35(10): 83-91(1992)
- [40]
- Kaizhong Zhang, Dennis Shasha:
Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems.
SIAM J. Comput. 18(6): 1245-1262(1989)
- [41]
- Kaizhong Zhang, Dennis Shasha, Jason Tsong-Li Wang:
Approximate Tree Matching in the Presence of Variable Length Don't Cares.
J. Algorithms 16(1): 33-66(1994)
- [42]
- George Kingsley Zipf:
Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology.
Addison-Wesley 1949
Copyright © Fri Mar 12 17:21:31 2010
by Michael Ley (ley@uni-trier.de)