ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results.

Jason Tsong-Li Wang, Gung-Wei Chirn, Thomas G. Marr, Bruce A. Shapiro, Dennis Shasha, Kaizhong Zhang: Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results. SIGMOD Conference 1994: 115-125
@inproceedings{DBLP:conf/sigmod/WangCMSSZ94,
  author    = {Jason Tsong-Li Wang and
               Gung-Wei Chirn and
               Thomas G. Marr and
               Bruce A. Shapiro and
               Dennis Shasha and
               Kaizhong Zhang},
  editor    = {Richard T. Snodgrass and
               Marianne Winslett},
  title     = {Combinatorial Pattern Discovery for Scientific Data: Some Preliminary
               Results},
  booktitle = {Proceedings of the 1994 ACM SIGMOD International Conference on
               Management of Data, Minneapolis, Minnesota, May 24-27, 1994},
  publisher = {ACM Press},
  year      = {1994},
  pages     = {115-125},
  ee        = {http://doi.acm.org/10.1145/191839.191863, db/conf/sigmod/WangCMSSZ94.html},
  crossref  = {DBLP:conf/sigmod/94},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Suppose you are given a set of natural entities (e.g., proteins, organisms, weather patterns, etc.) that possess some important common externally observable properties. You also have a structural description of the entities (e.g., sequence, topological, or geometrical data) and a distance metric. Combinatorial pattern discovery is the activity of finding patterns in the structural data that might explain these common properties based on the metric.

This paper presents an example of combinatorial pattern discovery: the discovery of patterns in protein databases. The structural representation we consider are strings and the distance metric is string edit distance permitting variable length don't cares. Our techniques incorporate string matching algorithms and novel heuristics for discovery and optimization, most of which generalize to other combinatorial structures. Experimental results of applying the techniques to both generated data and functionally related protein families obtained from the Cold Spring Harbor Laboratory show the effectiveness of the proposed techniques. When we apply the discovered patterns to perform protein classification, they give information that is complementary to the best protein classifier available today.

Copyright © 1994 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

Online Version (ACM WWW Account required): Full Text in PDF Format

CDROM Version: Load the CDROM "Volume 1 Issue 1, SIGMOD '93-'97" and ...

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Richard T. Snodgrass, Marianne Winslett (Eds.): Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 24-27, 1994. ACM Press 1994 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML, SIGMOD Record 23(2), June 1994
Contents

Online Edition: ACM Digital Library

[Abstract and Index Terms]
[Full Text in PDF Format, 1016 KB]

References

[1]
Rakesh Agrawal, Sakti P. Ghosh, Tomasz Imielinski, Balakrishna R. Iyer, Arun N. Swami: An Interval Classifier for Database Mining Applications. VLDB 1992: 560-573 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[2]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[3]
...
[4]
...
[5]
...
[6]
...
[7]
...
[8]
...
[9]
Vasant Dhar, Alexander Tuzhilin: Abstract-Driven Pattern Discovery in Databases. IEEE Trans. Knowl. Data Eng. 5(6): 926-938(1993) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[10]
William J. Frawley, Gregory Piatetsky-Shapiro, Christopher J. Matheus: Knowledge Discovery in Databases: An Overview. Knowledge Discovery in Databases 1991: 1-30 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[11]
Karen A. Frenkel: The Human Genome Project and Informatics. Commun. ACM 34(11): 40-51(1991) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[12]
...
[13]
Peter J. Haas, Arun N. Swami: Sequential Sampling Procedures for Query Size Estimation. SIGMOD Conference 1992: 341-350 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[14]
Jiawei Han, Yandong Cai, Nick Cercone: Knowledge Discovery in Databases: An Attribute-Oriented Approach. VLDB 1992: 547-559 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[15]
...
[16]
Wen-Chi Hou, Gultekin Özsoyoglu: Statistical Estimators for Aggregate Relational Algebra Queries. ACM Trans. Database Syst. 16(4): 600-654(1991) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[17]
Lucas Chi Kwong Hui: Color Set Size Problem with Application to String Matching. CPM 1992: 230-243 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[18]
...
[19]
...
[20]
Gad M. Landau, Uzi Vishkin: Fast Parallel and Serial Approximate String Matching. J. Algorithms 10(2): 157-169(1989) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[21]
...
[22]
Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider: Practical Selectivity Estimation through Adaptive Sampling. SIGMOD Conference 1990: 1-11 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[23]
Edward M. McCreight: A Space-Economical Suffix Tree Construction Algorithm. J. ACM 23(2): 262-272(1976) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[24]
Donald R. Morrison: PATRICIA - Practical Algorithm To Retrieve Information Coded in Alphanumeric. J. ACM 15(4): 514-534(1968) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[25]
Frank Olken, Doron Rotem: Random Sampling from B+ Trees. VLDB 1989: 269-277 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[26]
...
[27]
...
[28]
...
[29]
...
[30]
Dennis Shasha, Jason Tsong-Li Wang: New Techniques for Best-Match Retrieval. ACM Trans. Inf. Syst. 8(2): 140-158(1990) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[31]
Abraham Silberschatz, Michael Stonebraker, Jeffrey D. Ullman: Database Systems: Achievements and Opportunities. Commun. ACM 34(10): 110-120(1991) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[32]
Bharathi Subramanian, Stanley B. Zdonik, Theodore W. Leung, Scott L. Vandenberg: Ordered Types in the AQUA Data Model. DBPL 1993: 115-135 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[33]
Shalom Tsur: Data Dredging. IEEE Data Eng. Bull. 13(4): 58-63(1990) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[34]
...
[35]
Robert A. Wagner, Michael J. Fischer: The String-to-String Correction Problem. J. ACM 21(1): 168-173(1974) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[36]
Jason Tsong-Li Wang, Kaizhong Zhang, Karpjoo Jeong, Dennis Shasha: A System for Approximate Tree Matching. IEEE Trans. Knowl. Data Eng. 6(4): 559-571(1994) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[37]
Jason Tsong-Li Wang, Dennis Shasha: Query Processing for Distance Metrics. VLDB 1990: 602-613 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[38]
...
[39]
Sun Wu, Udi Manber: Fast Text Searching Allowing Errors. Commun. ACM 35(10): 83-91(1992) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[40]
Kaizhong Zhang, Dennis Shasha: Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems. SIAM J. Comput. 18(6): 1245-1262(1989) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[41]
Kaizhong Zhang, Dennis Shasha, Jason Tsong-Li Wang: Approximate Tree Matching in the Presence of Variable Length Don't Cares. J. Algorithms 16(1): 33-66(1994) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[42]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Fri Mar 12 17:21:31 2010 by Michael Ley (ley@uni-trier.de)