VLDB'98 - Program

VLDB'98

1998 VLDB Conference Program
Time	Banquet Hall South	Banquet Hall North	Julliard Complex
Sunday, August 23, 1998
6:00pm - 8:00	Registration in the Atrium (5th Floor)
8:00 - 11:00	Welcome Reception & Registration at the Roof Top Revolving Restaurant

Monday, August 24, 1998
8:00 - 9:00	Breakfast
9:00 - 9:30	OPENING CEREMONY
9:30 - 11:00	Keynote Talk 1 Doug Tygar: Atomicity versus Anonymity: Distributed Transactions for Electronic Commerce
11:00 - 11:30	Coffee Break
11:30 - 1:00	Industrial Session 1 Complex Query Languages	Tutorial 1 Electronic Commerce	Research Session 1 Text and Semistructured Data
1:00 - 2:30	Lunch (on your own)
2:30 - 4:00	Research Session 2 Performance and Multimedia	Tutorial 2 Caching and Replication on the Internet	Research Session 3 Join Procesing
4:00 - 4:30	Coffee Break
4:30 - 6:00	Research Session 4 Heterogeneity and Interoperability	Panel 1 Is Web-site Management a Database Problem?	Research Session 5 Query Processing

8:00pm	Miss Saigon Broadway Show (optional)

Tuesday, August 25, 1998
8:00 - 8:30	Breakfast
8:30 - 10:00	Research Session 6 Similarity	Tutorial 3a Managing Financial Time Series Data : Array Database Systems	Research Session 7 Query Optimization
10:00-10:30	Coffee Break
10:30 - 12:00	Industrial Session 2 Experience with Large Data Warehouses	Tutorial 3b Managing Financial Time Series Data : Object-Relational and Object Database Systems	Research Session 8 Query Processing and Optimization
12:00 - 1:30	Lunch (on your own)
1:30 - 3:00	Keynote Talk 2 David E. Shaw: Technology and the Future of Commerce and Finance
3:00 - 3:30	Coffee Break
3:30 - 5:00	Industrial Session 3 New Technology Database Vendor Offerings	Panel 2 Information, Communication, and Money: For What Can We Charge and How Can We Meter it?	Research Session 9 Data Mining and Warehousing

7:30 - 10:30	VLDB Banquet (Cruise Ship)

Wednesday, August 26, 1998
8:00 - 8:30	Breakfast
8:30 - 10:00	Keynote Talk 3 Charles Rozwat: DBMSs: From Data Management to Enterprise Management
10:00 - 10:30	Coffee Break
10:30 - 12:00	Research Session 10 High-Dimensional and Temporal Data	Industrial Session 4 Database Vendor Internals: the Oracle Story	Research Session 11 Data Mining I
12:00 - 1:30	Lunch (on your own)
1:30 - 3:00	Industrial Session 5 Database Vendor Kernels	Tutorial 4 Database Reverse Engineering	Research Session 12 Data Mining II
3:00 - 3:30	Coffee Break
3:30 - 5:00	Industrial Session 6 Support for Data Warehouses	Tutorial 5a Data Mining and KDD: KDD Overview with Details on Some Mining Methods	Research Session 13 Systems Issues I
5:00 - 5:30	Coffee Break
5:30 - 7:00	Research Session 14 Data Warehousing	Tutorial 5a (cont.) and Tutorial 5b Data Mining and KDD: Financial Applications	Research Session 15 Systems Issues II

9:00pm	Blue Note Jazz Club (optional)

Thursday, August 27, 1998
8:00 - 8:30	Breakfast
8:30 - 10:00	Keynote Talk 4. 10-Year Best VLDB Paper Award Dina Bitton and Jim Gray: The Rebirth of Database Machine Research
10:00 - 10:30	Coffee Break
10:30 - 12:00	Research Session 16 Spatial Data	Panel 3 Starting (and Sometimes Ending) a Database Company	Research Session 17 Data Mining III
12:00	CLOSING CEREMONY

6:00pm	Comedy Nation Dinner followed by Carolines' Comedy Club Show (optional)
8:00pm	Carolines' Comedy Club Show (optional)

The Industrial Systems Exhibits take place in the Lyceum Complex during the following dates:
- Monday, August 24, 1:00pm - 6:00pm
- Tuesday, August 25, 9:00am - 6:00pm
- Wednesday, August 26, 9:00am - 6:00pm
All plenary sessions take place in the combined Banquet Hall North and South.

Invited Talks

Invited Talk 1. Atomicity versus Anonymity: Distributed Transactions for Electronic Commerce
Doug Tygar (CMU and UC Berkeley, USA)
Monday, August 24, 9:30 - 11:00
Session Chair: Inderpal Singh Mumick (Savera Systems, USA)

Electronic commerce challenges our notions of distributed transactions in several ways. I discuss issues of how distributed transactions can apply to electronic commerce transactions, with special emphasis on the role of atomicity. I discuss of the applications of these to two systems I have helped design and build: Net Bill (a system for highly atomic micro-transactions) and Cryptographic Postage Indicia (a system for generating postage on laser printers attached to PCs or other devices.) I discuss the difficulties in integrating atomic, anonymous payment systems and some issues in supporting anonymous auctions. Finally, I conclude with a set of open questions.

Doug Tygar.

In September, Doug Tygar will take a position as Professor of Computer Science at UC Berkeley (he will hold appointments in two departments: EECS and the new School of Information Management and Systems.) Currently, Doug Tygar is on the computer science faculty at Carnegie Mellon University, where he has taught since receiving his PhD from Harvard University.

Doug Tygar has built a number of systems for computer security and electronic commerce, including: Cryptographic Postage Indicia (now an official US Postal Service standard for generating postage on PCs) and NetBill (an electronic commerce payment system for highly atomic micro-transactions).

His current projects include developing systems for electronic auctions, systems for better human interfaces for computer security, systems using secure coprocessors, and support for secure remote execution. Dr. Tygar has won numerous awards including an NSF Presidential Young Investigator Award and a "favorite teacher" award at CMU.

Invited Talk 2. Technology and the Future of Commerce and Finance
David E. Shaw (D. E. Shaw & Co., Inc., USA)
Tuesday, August 25, 1:30 - 3:00
Session Chair: Alexandros Biliris (AT&T Labs, USA)

Over the coming years, an increasingly ubiquitous and increasingly capacious Internet will introduce new opportunities for the creation of tightly integrated databases distributed across multiple institutions. These new capabilities, along with certain techniques arising from the emerging field of computational finance, could ultimately transform a substantial portion of the world's commercial and financial activity in fundamental ways. This talk will focus on some of the most significant changes such technologies may induce in the structure of the world financial system and the mechanisms of global commerce. Consideration will be given to such topics as algorithmic trading and portfolio optimization; electronic markets, automated market-making, and the historical inevitability of computational disintermediation; and the future of electronic commerce, including the potential use of shared knowledge bases incorporating standardized representations of enormous numbers of products and services available from multiple sources.

David E. Shaw, Ph.D.

David E. Shaw is the chairman and chief executive officer of D. E. Shaw & Co., Inc., a global investment bank whose activities center on various aspects of the intersection between technology and finance. The firm has been described as "arguably the most cutting-edge trading firm on the Wall Street" [Investment Dealer's Digest, November 15, 1993], and as "the most intriguing and mysterious force on the Wall Street today" [Fortune, February 5, 1996]. Dr. Shaw also serves as chairman of Juno Online Services, L.P., the nation's second largest provider (after America Online) of dialup Internet e-mail access, which now provides free, advertiser-supported e-mail service to over five million Americans.

The author of 62 scholarly publications, Dr. Shaw received his Ph.D. from Stanford University in 1980 and served on the faculty of the Department of Computer Science at Columbia University before joining Morgan Stanley & Co. in 1986 as its vice president in charge of automated analytical trading technology. Earlier, he founded and served as president and CEO of Stanford Systems Corporation, a computer systems firm based in California's "Silicon Valley." In 1994, he was appointed by President Clinton to the President's Committee of Advisors on Science and Technology, in which capacity he serves as chair of the Panel on Educational Technology.

Invited Talk 3. DBMSs: From data management to enterprise management
Charles Rozwat (Oracle Corporation, USA)
Wednesday, August 26, 8:30 - 10:00
Session Chair: Jennifer Widom (Stanford University, USA)

Databases systems are evolving from passive stores of information to more active systems that analyze and automatically distribute information to the appropriate people and processes. The database provides the most cost-effective, scalable, available and reliable platform for managing all your information processing needs. In todays enterprise all critical decisions are made based on timely access to information. Therefore, it is important that the business logic that acts on the information should be managed as carefully as data. This makes the database the focal point for managing an enterprise and the logical choice for deploying data-centric application logic.

This talk outlines the evolution of the database product from a data storage engine to providing a host of services needed for development, deployment and integration of applications. These services enable data management, warehousing, OLAP, enterprise resource planing, messaging, electronic commerce, web-based and internet applications to name a few. This talk will also explore how the database management system contrasts with and compliments the operating system in meeting the challenges of the information age.

Charles Rozwat.

Charles Rozwat is Senior Vice President of Oracle's Database Server Division. He is responsible for Product Development and Product Management for Oracle's Database Server Products. Previously at Oracle, Mr. Rozwat was Vice President of the New England Development Center where his responsibilities included Development, Product Management and Marketing responsibility for Rdb, Enterprise Management Performance products, the Spatial Data Option and Data Cartridge Development.

Prior to joining Oracle, Mr. Rozwat was a senior manager at Digital Equipment Corporation responsible for the Database business. He held a variety of senior management positions covering Database, Transaction Processing and Office products over his 17 years at Digital.

Invited Talk 4. The Rebirth of Database Machine Research
Dina Bitton (Integrated Data Systems, USA) and Jim Gray (Microsoft Research, USA)
Thursday, August 27, 8:30 - 10:00
Session Chair: Oded Shmueli (Technion, Israel)

Disks have changed a lot in the last decade: they now have 100 times more capacity, and each disk comes with a high-performance processor, with a few megabytes of memory, and a simple operating system. These disks have sophisticated caching and prefetch algorithms. The next step is obvious: each disk controller will be a supercomputer with a large memory and a powerful operating system. It will soon be possible to move most relational database processing to processors within the storage subsystem. Indeed, it will be possible to move part of the application to storage subsystem as well. This talk outlines these hardware and software trends. Then it outlines some of the design alternatives: general-purpose vs special purpose designs, degree of autonomy, hierarchical vs peer-to-peer designs, and the relationship to the database machine architectures of the 1970's.

Dina Bitton.

Dina recently started a new company, Integrated Data Systems, to develop and market data integration products. Until March of 1998, she was Chairman and Chief Technology Officer of DBStar, Inc., which she founded in 1993. DBStar is a leading software vendor in the area of Data Warehousing. Its flagship product, the Migration Architect, discovers structural patterns in data and automates the migration of heterogeneous data sources into a relational database.

In the last two decades, Dina has made research contributions in the area of high performance database systems: sorting of large files, database machines, benchmarking, and data warehousing. Prior to founding a Silicon Valley startup, she held faculty positions at Cornell University and at the University of Illinois at Chicago.

She holds a Ph.D. in Computer Science from the University of Wisconsin-Madison, a B.Sc. and an M.Sc. degree cum laude in Mathematics from the Technion Institute in Israel.

Jim Gray.

Jim is a specialist in database and transaction processing computer systems. At Microsoft his research focuses on scaleable computing: building super-servers and workgroup systems from commodity software and hardware. Prior to joining Microsoft, he worked at Digital, Tandem, IBM and AT&T on database and transaction processing systems including Rdb, ACMS, NonStopSQL, Pathway, System R, SQL/DS, DB2, and IMS-Fast Path.

He is editor of the Performance Handbook for Database and Transaction Processing Systems, and co-author of Transaction Processing Concepts and Techniques. He is a Member of the National Academy of Engineering, Fellow of the ACM, a member of the National Research Council’s Computer Science and Telecommunications Board, Trustee of the VLDB Foundation, and Editor of the Morgan Kaufmann series on Data Management.

Current Activities: Research on fault-tolerant, parallel, and distributed database systems. Manager of Microsoft's Bay Area Research Lab (BARC).

Tutorials

Tutorial 1. Electronic Commerce (and how it impacts DB Research)
Anant Jhingran and Manoj Kumar (IBM, USA)
Monday, August 24, 11:30 - 1:00
Session chair: Paolo Atzeni (Universita' di Roma Tre, Italy)

In this tutorial, we will discuss the scope of electronic commerce using a value-chain framework that discusses the tasks associated from the buyer and seller's perspective. In this framework, we will broadly discuss the state-of-the-art in the pre-sale, sales and post-sales processes. We then show how various aspects of these processes and systems affect, and are affected by database technology (shown in paranethesis in the following sentence). In particular, we discuss personalization (data mining issues), business rule mechanisms (trigger technology), data hiding for supply chain efficiencies (access control and statistical databases), sales analysis (warehousing and data mining), document storage and exchange (XML repositories), inter-business transactional semantics (two-phase commit) etc. People interested in learning about Electronic Commerce in general, and on the relationship to database research in particular are strongly encouraged to attend the tutorial. We will also talk about commercial implications of the technologies.

Anant Jhingran is the Senior Manager of Networked Data Systems and has primary research responsibilities in Decision Support and Electronic Commerce. He got his PhD from University of California at Berkeley in 1990, and has been at IBM T.J. Watson Research Center since then. In the early years at IBM, he focussed on parallel and scalable databases, and received an IBM Corporate Award for his contributions to DB2 Parallel Edition. In the last couple of years, he has been focussing on electronic commerce and how it influences, and is influenced by, database research.

Manoj Kumar received the M.S. and Ph.D degrees in electrical engineering from Rice University, Houston, Texas, in 1981 and 1984, respectively. Since then he has been at IBM T.J. Watson Research Center. Currently he is the manager of Electronic Commerce Systems group. He is investigating new business applications which exploit the Internet to automate inter-enterprise and business to consumer interactions.

Tutorial 2. Caching and Replication on the Internet
Michael Rabinovich (AT&T Labs, USA)
Monday, August 24, 2:30 - 4:00
Session chair: Rivka Ladin (Tandem, Israel)

As commercial interest to the Internet continues to grow, the issues of scalability and performance become increasingly important. In fact, we are on the verge of another qualitative jump in Internet load levels, due to the upcoming replacement of slow modem lines, which act as floodgates limiting user access to the Internet, with much faster alternatives like ISDN lines and cable modems. Consequently, caching and replication, being the primary tools that address these issues, are fast becoming a focus of attention in both industrial and academic research communities. This tutorial introduces the audience to the current state of the art in the area of caching and replication on the Internet. It should be of interest to both researchers and IT professionals. Researchers should find it interesting to see how specifics of the Internet environment motivate new approaches to caching and replication, and learn about some of the latest product offerings in this area. IT professionals will learn, among other things, about some recent experimental analyses of Web behavior and some advantages and pitfalls of the latest technologies they might be considering.

Michael Rabinovich is a Principal Technical Staff Member at AT&T Labs - Research. He has published extensively in the areas of replicated systems, transaction management, and Web performance. He is a co-author of the CRISP distributed proxy cache. His recent professional activities include organizing a panel on database research and the Web at ICDE-98 and serving as an Industrial Program Co-Chair for SIGMOD-99.

Tutorial 3a. Managing Financial Time Series Data: Array Database Systems
Dennis Shasha (Courant Institute, New York University, USA)
Tuesday, August 25, 8:30 - 10:00
Session chair: Brad Adelberg (Northwestern University, USA)

Arrays play a major role in finance and science, because of the necessity to represent ordered sequences of values such as prices, inventories, measurements, and so on. Operations on arrays include commutative aggregate functions such as sum and variance, moving aggregates such as moving averages, forcasting based on autoregression or data mining, correlations between different arrays, and array dilations due to different measurement granularities. This tutorial discusses the following questions: Which linguistic approaches are promising for such problems? Which implementation approaches work well? Is there a reasonable benchmark for array applications, at least in finance?

Dennis Shasha does research on biological pattern discovery, parallel processing, and database tuning. He has consulted at the various Bell Labs and on Wall Street on problems having to do with database tuning and database design most recently for finance. Lately, he has helped design a database language for arrays in cooperation with Kx systems, a company that develops language environments for high performance applications in finance. He has written several books and published papers in the usual places. You can find details at http://cs.nyu.edu/cs/faculty/shasha/index.html

Tutorial 3b. Managing Financial Time Series Data: Object-Relational and Object Database Systems
Lory Molesky (Oracle Corporation, USA) and Michael Caruso (Innovative Systems Techniques, USA)
Tuesday, August 25, 10:30 - 12:00
Session chair: Brad Adelberg (Northwestern University, USA)

This tutorial provides an overview of financial time series applications, such as portfolio management, investment analysis, and economic modeling, and discusses database support for these applications. The intended audience includes (a) database researchers and practitioners who are curious about the financial time series domain, and (b) members of the financial community who are interested in database solutions for time series applications. The first part of the tutorial will focus on time series modeling issues and various aspects of financial time series data. Topics include the dynamics of portfolio management, schemas for classifying holdings and trades, and the requirements of financial time series data management. The second part of the tutorial addresses database support for time series applications. In the past, time series applications have largely been supported by specialty file systems employing proprietary data stores. We address database support for time series applications, specifically, we show the deficiencies of the relational model, then we examine how object-relational and object database systems can satisfy the requirements of time series applications.

Lory Molesky received his PhD in Computer Science 1996, and is currently with Oracle Corp.'s New England Development Center. He is the primary architect of Oracle's time series database cartridge, and has given numerous seminars on this and other topics in database systems.

Michael Caruso received his S.B. and M.S. degrees in Electrical Engineering and Computer Science from MIT in 1975 and 1978. He is a co-founder and Director of Research and Development at Innovative Systems Techniques, Inc. (Insyte) in Newton, MA. Mike conceived the architecture, design, and implementation of Vision, Insyte's temporal, object-oriented analytical system.

Tutorial 4. Database Reverse Engineering
Michael Blaha (OMT Associates, USA)
Wednesday, August 26, 1:30 - 3:00
Session chair: Euthimios Panagos (AT&T Labs, USA)

Reverse engineering is the process of taking an existing design and extracting the underlying conceptual intent. There are many reasons for wanting to perform reverse engineering. For example, reverse engineering is often used to elicit requirements from past systems to seed the development of new systems. Reverse engineering also facilitates legacy data conversion. A third motivation is software assessment; reverse engineering offers an opportunity to assess the quality of vendor products that include a database. This tutorial will summarize our process and discuss some of our technical and business experiences. The theory for relational database design is quite good, but the practice is often poor as we will illustrate. Our specific focus will be on reverse engineering of relational databases, but many similar issues also arise with other database paradigms.

Michael Blaha is an alumnus of the GE R&D Center in Schenectady, New York and an author of the OMT methodology. For the past several years Dr. Blaha has been an independent consultant and trainer in the areas of modeling, database design, and reverse engineering. He is the lead author for the new book "Object-Oriented Modeling and Design for Database Applications".

Tutorial 5a. Data Mining and KDD: KDD Overview with Details on Some Mining Methods
Usama Fayyad (Microsoft Research, USA) and Evangelos Simoudis (IBM, USA)
Wednesday, August 26, 3:30 - 5:00 and 5:30 - 6:00
Session chair: Ted Johnson (AT&T Labs, USA)

Data Mining methods have their origins in a variety of fields: Statistics, Databases, Pattern Recognition, Visualization, Parallel Computing, and Information Retrieval. This tutorial provides an overview, bringing in a mix of techniques and notions from the various constituent fields. The goal is providing more natural methods for effectively navigating, summarizing, and making better use of database content. Unfortunately, methods from statistics usually do not consider database issues of data access and scalability; data mining methods from databases often have insufficient grounding in relevant principles from pattern recognition, statistics, etc. SQL requires exact logical description of target data subset. Queries in decision support are often imprecise and exploratory. OLAP extends user-query-driven framework but still relies on the human to drive the process and to "spot" interesting patterns. In contrast, data mining systems offer an approach where the machine can do most of the tedious work to help the guide the user to events and views of interest.

Usama Fayyad: Senior Researcher, Microsoft Research. Ph.D, 1991, University of Michigan, Ann Arbor. Joined Caltech/NASA Jet Propulsion Laboratory (89-96): headed group in mining scientific databases. Received several JPL awards including 1994 NASA Exceptional Achievement Medal. Chaired KDD conferences (94-96). Editor-in-chief of journal: Data Mining and Knowledge Discovery. Co-editor of MIT Press book (1996) on KDD.

Evangelos Simoudis: Vice President, Global Business Intelligence Solutions - IBM North America, where he is responsible for the development and deployment of data mining and decision support solutions to IBM's customers worldwide. Before IBM, he worked at Lockheed leading data mining research. He received Ph.D.from Brandeis University then worked at DEC's AI Center.

Tutorial 5b. Data Mining and KDD: Financial Applications
Tae Horn Hann (University of Karlsruhe, Germany) and Gholamreza Nakhaeizadeh (Daimler-Benz, Germany)
Wednesday, August 26, 6:00 - 7:00
Session chair: Ted Johnson (AT&T Labs, USA)

In recent years data mining is being increasingly applied in finance. It is considered by many financial management institutions as an innovative technology to support conventional quantitative techniques. Its use in computational finance will have a major impact in the modeling of currency markets, in tactical asset allocation, bond and stock valuation and portfolio optimization. In addition the application of Data Mining for scoring tasks delivers valuable support for the management of client credit risk and fraud detection. We will discuss the various methods used in data mining to financial applications. These steps in modeling include: data analysis, preprocessing, model selection, evaluation and performance measures. Based on the methods outlined above we give two applications: The first gives an application in forecasting a financial time series using neural nets. The second gives a real world application to credit scoring. This tutorial addresses practitioners as well as researchers from finance, econometrics, and information systems.

Tae Horn Hann was born in Frankfurt am Main, Germany in 1968. He received his Master degree from the University of Karlsruhe in 1995. He is currently a Ph.D.-student at the University of Karlsruhe. His interest include the application of AI in finance such as neural networks and symbolic machine learning as well as conventional econometric approaches.

Gholamreza Nakhaeizadeh has completed his Ph.D. in Applied Bayesian Statistics in 1984 and his Habilitation (Postdoctoral Thesis) in Applied Econometrics in 1988, both at the University of Karlsruhe, Germany. Currently he is the head of Department of Machine Learning and Data Mining at the Daimler-Benz Research Center in Ulm, Germany. Since 1989 he is also Professor of Economics and Econometrics at the University of Karlsruhe.

Research Sessions

Research Session 1. Text and Semistructured Data
Monday, August 24, 11:30 - 1:00
Session chair: Sophie Cluet (INRIA, France)

Determining Text Databases to Search in the Internet.
Weiyi Meng (SUNY at Binghamton, USA), King-Lup Liu (University of Illinois at Chicago, USA), Clement Yu (University of Illinois at Chicago, USA), Xiaodong Wang (SUNY at Binghamton, USA), Yuhsi Chang (SUNY at Binghamton, USA) N. Rishe (Florida International University, USA)

Proximity Search in Databases.
Roy Goldman, Narayanan Shivakumar, Suresh Venkatasubramanian, Hector Garcia-Molina (Stanford University, USA)

Incremental Maintenance for Materialized Views over Semistructured Data.
Serge Abiteboul (INRIA, France), Jason McHugh, Michael Rys, Vasilis Vassalos, Janet L. Wiener (Stanford University, USA)

Research Session 2. Performance and Multimedia
Monday, August 24, 2:30 - 4:00
Session chair: Tiziana Catarci (University of Rome, Italy)

Performance measurements of tertiary storage devices.
Theodore Johnson (AT&T Labs, USA), Ethan Miller (University of Maryland-Baltimore County, USA)

Active Storage for Large-Scale Data Mining and Multimedia. (vision paper)
Erik Riedel, Garth Gibson, Christos Faloutsos (Carnegie Mellon University, USA)

Resource Scheduling for Composite Multimedia Objects.
Minos Garofalakis (University of Wisconsin-Madison, USA), Yannis Ioannidis (University of Wisconsin-Madison, USA), Banu Ozden (Bell Labs, USA)

Research Session 3. Join Processing
Monday, August 24, 2:30 - 4:00
Session chair: Kyu-Young Whang (KAIST, Korea)

Integrating hash joins and hash teams.
Goetz Graefe, Ross Bunker, Shaun Cooper (Microsoft Corporation, USA)

Diag-Join: An Opportunistic Join Algorithm for 1:N Relationships.
Sven Helmer, Till Westmann, Guido Moerkotte (Mannheim University, Germany)

Evaluating Functional Joins Along Nested Reference Sets in Object-Relational and Object-Oriented Databases.
Reinhard Braumandl, Jens Claussen, Alfons Kemper (Passau University, Germany)

Research Session 4. Heterogeneity and Interoperability
Monday, August 24, 4:30 - 6:00
Session chair: Asuman Dugac (Middle East Technical University, Turkey)

Using Schema Matching to Simplify Heterogeneous Data Translation.
Tova Milo, Sagit Zohar (Tel Aviv University, Israel)

nD-SQL: A Multi-Dimensional Language for Interoperability and OLAP.
Frederic Gingras, Laks V.S. Lakshmanan (Concordia University, Canada)

The Heterogeneity Problem and Middleware Technology: Experiences with and Performance of Database Gateways. (experience paper).
Fernando de Ferreira Rezende and Klaudia Hergula (Daimler-Benz AG, Germany)

Research Session 5. Query Processing
Monday, August 24, 4:30 - 6:00
Session chair: Peter M.G. Apers (University Twente, Netherlands)

Reducing the Braking Distance of an SQL Query Engine.
Michael J. Carey (IBM Almaden Research Center, USA), Donald Kossman (University of Passau, Germany)

Querying Continuous Time Sequences.
Ling Lin, Tore Risch (Linkoping University, Sweden)

Low-Cost Compensation-Based Query Processing.
Oystein Grovlen (ClustRa, Norway), Svein-Olaf Hvasshovd (Norwegian University of Science and Technology, Norway), Oystein Torbjornsen (ClustRa, Norway)

Research Session 6. Similarity
Tuesday, August 25, 8:30 - 10:00
Session chair: Tan Kian Lee (National University of Singapore, Singapore)

A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces.
Roger Weber, Hans-J. Schek (ETH, Switzerland), Stephen Blott (Bell Labs, USA)

Improving Adaptable Similarity Query Processing by using Approximations.
Mihael Ankerst, Bernhard Braunmueller, Hans-Peter Kriegel, Thomas Seidl (University of Munich, Germany)

MindReader: Querying databases through multiple examples.
Yoshiharu Ishikawa (Nara Inst. of Science and Technology, Japan), Ravishankar Subramanya (Pittsburgh Supercomputing Center, USA), Christos Faloutsos (Carnegie Mellon University, USA)

Research Session 7. Query Optimization
Tuesday, August 25, 8:30 - 10:00
Session chair: Marc H. Scholl (University of Konstanz, Germany)

Design and Analysis of Parametric Query Optimization Algorithms.
Sumit Ganguly (Indian Inst. of Technology, Kanpur, India)

Inferring Function Semantics to Optimize Queries.
Mitch Cherniack, Stan Zdonik (Brown University, USA)

TOPAZ: a Cost-Based, Rule-Driven, Multi-Phase Parallelizer.
Clara Nippl, Bernhard Mitschang (Technische Universitaet Muenchen, Germany)

Research Session 8. Query Processing and Optimization
Tuesday, August 25, 10:30 - 12:00
Session chair: Daniel Lieuwen (Lucent Bell Labs, USA)

Filtering with approximate predicates.
Narayanan Shivakumar, Hector Garcia-Molina, Chandra Chekuri (Stanford University, USA)

Optimal Histograms with Quality Guarantees.
H. V. Jagadish (AT&T Labs, USA), Nick Koudas (University of Toronto, Canada), S. Muthukrishnan (Bell Labs, USA), Viswanath Poosala (Bell Labs, USA), Ken Sevcik (University of Toronto, Canada), Torsten Suel (Bell Labs, USA)

Binding Propagation in Disjunctive Databases.
Sergio Greco (Universita della Calabria, Italy)

Research Session 9. Data Mining and Warehousing
Tuesday, August 25, 3:30 - 5:00
Session chair: Surajit Chaudhuri (Microsoft, USA)

Computing Iceberg Queries Efficiently.
Min Fang, Narayanan Shivakumar, Hector Garcia-Molina, Rajeev Motwani, Jeffrey D. Ullman (Stanford University, USA)

Clustering Categorical Data: An Approach Based on Dynamical Systems.
David Gibson (UC Berkeley, USA), Jon Kleinberg (Cornell University, USA), Prabhakar Raghavan (IBM Almaden Research Center, USA)

Incremental Clustering for Mining in a Data Warehousing Environment.
Martin Ester, Hans-Peter Kriegel, Jorg Sander, Michael Wimmer, Xiaowei Xu (University of Munich, Germany)

Research Session 10. High-Dimensional and Temporal Data
Wednesday, August 26, 10:30 - 12:00
Session chair: Timos Sellis (National Technical University of Athens, Greece)

On Optimal Node Splitting for R-trees.
Yvan Garcia, Mario Lopez, Scott Leutenegger (University of Denver, USA)

R-Tree Based Indexing of Now-Relative Bitemporal Data.
Rasa Bliujute, Christian S. Jensen, Simonas Saltenis, Giedrius Slivinskas (Aalborg University, Denmark)

Fast High-Dimensional Data Search in Incomplete Databases.
Beng Chin Ooi, Cheng Hian Goh, Kian-Lee Tan (National University of Singapore, Singapore)

Research Session 11. Data Mining I
Wednesday, August 26, 10:30 - 12:00
Session chair: Maria Orlowska (University of Queensland, Australia)

On the Discovery of Interesting Patterns in Association Rules.
Sridhar Ramaswamy (Bell Labs, USA), Sameer Mahajan (Informix Software, USA), Avi Silberschatz (Bell Labs, USA)

Algorithms for Mining Association Rules for Binary Segmentations of Huge Categorical Databases.
Yasuhiko Morimoto, Takeshi Fukuda, Hirofumi Matsuzawa, Takeshi Tokuyama, Kunikazu Yoda (IBM Tokyo Research Laboratory, Japan)

Algorithms for Mining Distance-Based Outliers in Large Datasets.
Edwin Knorr, Raymond Ng (University of British Columbia, Canada)

Research Session 12. Data Mining II
Wednesday, August 26, 1:30 - 3:00
Session chair: Christos Faloutsos (CMU, USA)

PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning.
Rajeev Rastogi, Kyuseok Shim (Bell Labs, USA)

RainForest - A Framework for Fast Decision Tree Construction of Large Datasets.
Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Ganti (University of Wisconsin-Madison, USA)

WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases.
G. Sheikholeslami, S. Chatterjee, A. Zhang (SUNY at Buffalo, USA)

Research Session 13. System Issues I
Wednesday, August 26, 3:30 - 5:00
Session chair: S. Seshadri (Indian Institute of Technology, Bombay, India)

An Asynchronous Avoidance-based Cache Consistency Algorithm for Client Caching DBMSs.
M. Tamer Ozsu (GMD-IPSI, Germany), Kaladhar Voruganti (University of Alberta, Canada), Ronald Unrau (Cygnus Solutions, USA)

Design, Implementation, and Performance of the LHAM Log-Structured History Data Access Method.
Peter Muth (University of the Saarland, Germany), Patrick O'Neil (UMass/Boston, USA), Achim Pick, Gerhard Weikum (University of the Saarland, Germany)

Secure Buffering in Firm Real-Time Database Systems.
Binto George, Jayant Haritsa (Indian Inst. of Science, India)

Research Session 14. Data Warehousing
Wednesday, August 26, 5:30 - 7:00
Session chair: Tim Griffin (Lucent Bell Labs, USA)

Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing.
Guido Moerkotte (Mannheim University, Germany)

Materialized View Selection for Multidimensional Datasets.
Amit Shukla, Prasad Deshpande, Jeffrey Naughton (University of Wisconsin-Madison, USA)

Expiring Data in a Warehouse.
Hector Garcia-Molina, Wilburt Labio, Jun Yang (Stanford University, USA)

Research Session 15. System Issues II
Wednesday, August 26, 5:30 - 7:00
Session chair: Christoph Freytag (Humboldt University of Berlin, Germany)

Safely and Efficiently Updating References During On-line Reorganization.
Chendong Zou (Informix Software, USA), Betty Salzberg (Northeastern University, USA)

Buffering and Read-Ahead Strategies for External Mergesort.
Weiye Zhang, Per-Ake Larson (Microsoft Corporation, USA)

Bulk-Loading Techniques for Object Databases and an Application to Relational Data.
Sihem Amer-Yahia, Sophie Cluet (INRIA, France), Claude Delobel (University of Paris XI, France)

Research Session 16. Spatial Data
Thursday, August 27, 10:30 - 12:00
Session chair: Christian S. Jensen (Aalborg University, Denmark)

Algorithms for Querying by Spatial Structure.
Dimitris Papadias, Nikos Mamoulis (Hong Kong University of Science and Technology, Hong Kong), Vasilis Delis (University of Patras, Greece)

A Raster Approximation For Processing of Spatial Joins.
Geraldo Zimbrao, Jano Moreira de Souza (Federal University of Rio de Janeiro, Brazil)

Scalable Sweeping-Based Spatial Join.
Lars Arge (Duke University, USA), Octavian Procopiuc (Duke University, USA), Sridhar Ramaswamy (Bell Labs, USA), Torsten Suel (Bell Labs, USA), Jeffrey Scott Vitter (Duke University, USA)

Research Session 17. Data Mining III
Thursday, August 27, 10:30 - 12:00
Session chair: Paula Hawthorn (Andromedia, USA)

Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining.
Flip Korn, Alexandros Labrinidis, Yannis Kotidis, Christos Faloutsos (University of Maryland, USA)

Scalable Techniques for Mining Causal Structures.
Craig Silverstein, Sergey Brin, Rajeev Motwani, Jeff Ullman (Stanford University, USA)

Mining surprising patterns using temporal description length.
Soumen Chakrabarti, Sunita Sarawagi, Byron Dom (IBM Almaden Research Center, USA)

Industrial Sessions

Industrial Session 1. Database Vendors Meet Complex Query Languages
Monday, August 24, 11:30 - 1:00
Session chair: Patrick Valduriez (INRIA, France)

Massive Stochastic Testing of SQL
Don Slutz (Microsoft Research, USA)

Selectivity Estimation in Extensible Databases - A Neural Network Approach
Seetha Lakshmi and Shaoyu Zhou (Informix Software Inc., USA)

The Drill Down Benchmark
Peter A. Boncz (University of Amsterdam, Netherlands), Tim Rühl, (Data Distilleries BV, Netherlands), Fred Kwakkel (Data Distilleries BV, Netherlands)

Industrial Session 2. Experience with Large Data Warehouses
Tuesday, August 25, 10:30 - 12:00
Session chair: Mauricio Lopez (Bull, France)

Issues in Developing Very Large Data Warehouses
Lyman Do, Pamela Drew, Wei Jin, Vish Jumani, David Van Rossum (Boeing Company, USA)

The National Medical Practice Knowledge Bank
Warren Sterling (NCR Parallel Systems, USA)

Bank of America: Case Study - The Information Currency Advantage
Felipe Carino, Mark Jahnke (NCR Teradata, USA)

Industrial Session 3. New Technology Database Vendor Offerings
Tuesday, August 25, 3:30 - 5:00
Session chair: Martin Kersten (CWI, Netherlands)

DTL's DataSpot: Database Exploration using Plain Language
Shaul Dar, Gadi Entin, Shai Geva, Eran Palmon (Data Technologies Ltd., Israel)

From Data to Knowledge Independence : An on-going Story
Laurent Vieille (Next Century Media, Inc., France)

Federating Databases with IRO-DB
Peter Fankhauser (GMD, Germany), Georges Gardarin (University of Versailles, France), Mauricio Lopez (Bull, France), Jose Munoz (Ibermatica, Spain), Anthony Tomasic (INRIA, France)

Industrial Session 4. Database Vendor Internals: the Oracle Story
Wednesday, August 26, 10:30 - 12:00
Session chair: Jim Gray (Microsoft Research, USA)

Materialized Views in Oracle
Karl Dias, Alan Downing, Bill Norcott, Harry Sun, Randy Bello, Jay Feenan, Jim Finnerty, Andy Witkowski, Mohamed Ziauddin (Oracle Corporation, USA)

Incremental Checkpointing in Oracle
Ashok Joshi, William Bridge, Juan Loaiza (Oracle Corporation, USA)

Architecture of Oracle Parallel Server
R.Bamford, D.Butler, B.Klots, N.Macnaughton (Oracle Corporation, USA)

Industrial Session 5. Database Vendor Kernels
Wednesday, August 26, 1:30 - 3:00
Session chair: Munir Cochinwala (Bellcore, USA)

KODA - The Architecture and Implementation of a Data Model Independent Kernel
Gopalan Arun and Ashok Joshi (Oracle Corporation, USA)

The ADABAS Buffer Pool Manager
Harald Schoning (Software AG, Germany)

A Database System for Real-Time Event Aggregation in Telecommunication
Jerry Baulier, Stephen Blott, Henry F. Korth, Avi Silberschatz (Bell Labs, USA)

Industrial Session 6. Database Vendor Support for Data Warehouses
Wednesday, August 26, 3:30 - 5:00
Session chair: Dennis Shasha (Courant Institute, New York University, USA)

Building Petabyte Databases With Objectivity/DB
Leon Guzenda (Objectivity, Inc., USA)

Optimizing Queries in DataJoiner using Optimizer Morphing
Shivakumar Venkataraman and Tian Zhang (IBM, USA)

Plan-Per-Tuple Optimization Solution -- Parallel Execution of Expensive User-Defined Functions
Felipe Carino, William O'Connell, (NCR Teradata, USA)

Panels

Panel 1. Is Web-site Management a Database Problem?
Monday, August 24, 4:30 - 6:00
Organizers: Alon Levy (University of Washington, USA), Daniela Florescu (INRIA, France) and Dan Suciu (AT&T Labs, USA)
Session Chair: Stefano Cer (Politecnico di Milano, Italy)

Several recent events have heatedly discussed the applicability of database technology to the Internet and the World-Wide Web (e.g., DeWitt's VLDB-95 talk, a 1996 DIMACS Web/DB workshop, ICDE-98 panel). One of the areas that has emerged from these discussions as a candidate for impact of the database community is that of Web site construction and management. In parallel, several research projects have been started with the goal of addressing this problem (e.g., Strudel (AT&T Research), Araneus (University of Rome), YAT (INRIA, France) and WebOQL (University of Toronto)). The common theme of these projects is the declarative management of the content and structure of web sites. These projects feed off previous relevant work on management of semistructed data and on data integration. In addition to the research activity, there has been a flurry of activity among database vendors to develop tools for serving data that is stored in databases. Other web site management tools are being developed by non-database companies (e.g., products such as FrontPage, NetObjects, and many others). These products are starting to provide more and more features to incorporate data from multiple external sources and for managing the structure of Web sites.

The purpose of this panel is to discuss whether Web site managment is a database problem (in whole, or at least in part). The panel will put forward several contradictory opinions on the topic. In particular, we expect some of the following opinions to be represented:

Web-site management is not a database problem. Even though it has some data management elements to it, these are minor. Web site management will remain mostly a combination of user interface issues and building flexible tools for writing CGI bin scripts. An analogy that often comes up here is with the area of network management where database technology did not contribute much.

Web-site management is a solved database problem. The best Web site management tools are already out there. They're called Oracle, Informix, DB2, Sybase, O2, etc. Adapting current database management systems to the problem of Web site management involves only minor twiddles and user interface additions to current systems.

Web-site management is (to a large extent) a database problem. However, when building a Web site management system based on database concepts, we need to rethink many of the assumptions we make in traditional database systems. For example, we need a new data model that also supports the modeling of the Web site structure, we need new query languages, design principles for Web sites, new data warehousing techniques, and new methods for embedding such systems in a programming environment.

Panel 2. Information, Communication, and Money: For What Can We Charge and How Can We Meter it?
Tuesday, August 25, 3:30 - 5:00
Organizers: Stephen Blott, Henry F. Korth, Avi Silberschatz (Bell Labs, USA)
Session Chair: Klaus Dittrich (Universitat Zurich, Switzerland )

Internet telephony, electronic commerce, on-line information services, and video conferencing are projected to grow rapidly in coming years. How can we charge for these and similar services? What information must be gathered to enable proper billing for services rendered? Where is this information generated and where is it collected? These questions are being addressed in today's marketplace in an ad-hoc manner. Since this is in essence a distributed information system problem, the database research community should be able to have a positive influence on the evolution of billing systems for electronic communication and information services. Billing systems represent a significant and growing database problem. They offer an application domain for real-time databases, materialized views, event aggregation, data mining, OLAP, etc. The goal of this panel is to create awareness of this application domain and to inform conference attendees of the key issues and opportunities via a (vigorous, we hope) debate. Among the issues we shall address are:

Usage metrics: connect time, distance, network hops, number of packets, number of bytes etc.

Quality of service: delivery time, transmission quality, real-time issues, data resolution, etc.

Real-time pricing

Credit- and debit-based billing, authentication

Security

Timing constraints

Cost of metering versus cost of the service: Are flat access charges preferable?

Billing among service providers (called "settlements" in the industry)

For each of these issues, the emphasis will be on the database and distributed systems aspects of solving the problems presented.

Panel 3. Starting (and Sometimes Ending) a Database Company
Thursday, August 27, 10:30 - 12:00
Organizer: Jack Orenstein (Novera, USA)
Session Chair: Yannis Ioanidis (University of Athens and University of Wisconsin, Greece)

Why does someone start a database company? Some become aware of being in the right place at the right time; they have the right technical insights and enough business sense to realize that they possess the right solution to a real problem. Some are on a religious mission; they are visionaries who see a better way and want to lead others to enlightenment. Some do it for the money. Sometimes, a database researcher gets tired of solving esoteric theoretical problems understood by maybe 1000 people in the world, and whose solution is comprehensible to perhaps 10. Do this long enough, and it's easy to understand the appeal of building something that will be used by millions.

What's it like to start a database company? When do you start to wonder, "why did I ever think this was a good idea?" or "why didn't I do this years ago?" How does the company change over the years? What's it like when you have real customers who depend on you and you have to support them instead of building a four-phase, six-color locking scheme, or rewriting the query optimizer? At each stage of the company, what was the bane of your existence? Are you ever comfortable with the promises being made by your sales and marketing guys? How do you evolve your role within the company as it goes through infancy, childhood, adolescence, middle age, and possibly even senscence?

What's life like when it's over? What do you do when the company folds or you leave? Can you go back to research? If you do, when do you start to wonder, "why did I ever think this was a good idea?" or "why didn't I do this years ago?"

Exhibits

The Cubetree Storage Organization
Advanced Communication Technology, Inc., USA

DataBlitz: A High Performance Main-Memory Storage Manager
Bell Laboratories, USA

Compaq Industrial Exhibit
Compaq, USA

Bridging Heterogeneity: Research and Practice of Database Middleware Technology
Daimler-Benz, Germany

IBM Industrial Exhibit
IBM, USA

Informix Industrial Exhibit
Informix, USA

RANGER, a distributed integration platform developed for the financial sector
Inventure America, USA

MapInfo SpatialWare. A Spatial Information Server for RDBMS
MapInfo Corporation, USA

Microsoft Industrial Exhibit
Microsoft, USA

A Single Pass Computing Engine for Interactive Analysis of VLDB
Mihalisin Associates, USA

Objectivity Industrial Exhibit
Objectivity Inc., USA

Oracle Industrial Exhibit
Oracle Corporation, USA