|
TUTORIAL PROGRAM
|
|
Tutorial
1: Database Architectures for New Hardware
[Presentation
PDF] [Handout
(2 slides/page)] [Notes
(3 slides/page with space for note)]
Anastassia Ailamaki,
Carnegie Mellon University
Room
4, Tuesday 11:00-12:30 and 2:00-3:30
Thirty years ago,
DBMS stored data on disks and cached recently used data in main memory
buffer pools, while designers worried about improving I/O performance
and maximizing main memory utilization. Today, however, databases live
in multi-level memory hierarchies that include disks, main memories, and
several levels of processor caches. Four (often correlated) factors have
shifted the performance bottleneck of data-intensive commercial workloads
from I/O to the processor and memory subsystem. First, storage systems
are becoming faster and more intelligent (now disks come complete with
their own processors and caches). Second, modern database storage managers
aggressively improve locality through clustering, hide I/O latencies using
prefetching, and parallelize disk accesses using data striping. Third,
main memories have become much larger and often hold the application's
working set. Finally, the increasing memory/processor speed gap has pronounced
the importance of processor caches to database performance.
This tutorial will first survey techniques proposed in the computer architecture
and database literature on understanding and evaluating database application
performance on modern hardware. We will present approaches and methodologies
used to produce time breakdowns when executing database workloads on modern
processors. Then, we will survey techniques proposed to alleviate the
problem, with major emphasis on data placement and prefetching techniques
and their evaluation. Finally, we will discuss open problems and future
directions: Is it only the memory subsystem database software architects
should worry about? How important are other decisions processors make
to database workload behavior? Given the emerging multi-threaded, multi-processor
computers with modular, deep cache hierarchies, how feasible is it to
create database systems that will adapt to their environment and will
automatically take full advantage of the underlying hierarchy?
Anastassia
Ailamaki, CMU
Anastassia Ailamaki received a B.Sc. degree in Computer Engineering from
the Polytechnic School of the University of Patra, Greece, M.Sc. degrees
from the Technical University of Crete, Greece nd from the University
of Rochester, NY, and a Ph.D. degree in Computer Science from the University
of Wisconsin-Madison. In 2001, she joined the Computer Science Department
at Carnegie Mellon University as an Assistant Professor. Her research
interests are in the broad area of database systems and applications,
with emphasis on database system behavior on modern processor hardware
and disks. Her projects at Carnegie Mellon (including Staged Database
Systems, Cache-Resident Data Bases, and the Fates Storage Manager), aim
at building systems to strengthen the interaction between the database
software and the underlying hardware and I/O devices. Her other research
interests include automated database design for scientific databases,
storage device modeling, and internet querying. She has received three
best-paper awards (VLDB 2001, Performance 2002, and ICDE 2004), an NSF
CAREER award (2002), and IBM Faculty Partnership awards in 2001, 2002,
and 2003. She is a member of IEEE and ACM.
Tutorial
2: Security of Shared Data in Large Systems: State of the Art and Research
Directions
[Presentation
PPS] [Handout (2 slides/page)]
[Notes (3 slides/page with space
for note)]
Arnon Rosenthal
and Marianne Winslett
Room
4, Tuesday 4:00-5:30 and Wednesday 11:00-12:30
Security is increasingly
recognized as a key impediment to sharing data in enterprise systems,
virtual enterprises, and the semantic web. Yet the topic has not been
a focus for mainstream database research, industrial progress in data
security has been slow, and (too) much security enforcement is in application
code, or else is coarse grained and insensitive to data contents. Today
the database research community is in an excellent position to make significant
improvements in the way people think about security policies, due to the
community's experience with declarative and logic-based specifications,
automated compilation and physical design, and both semantic and efficiency
issues for federated systems. These strengths provide a foundation for
improving both theory and practice. This tutorial aims to enlighten the
VLDB community about the state of the art in data security, especially
for enterprise or larger systems, and to engage the community's interest
in improving the current state of affairs by showing how database researchers
can help improve the state of the art in data security.
Arnie Rosenthal, MITRE
Arnie Rosenthal is a Principal Scientist at MITRE. He has broad interests
in problems that arise when data is shared between communities, including
a long-standing interest in the security issues that arise in data warehouses,
federated databases, and enterprise information systems. He has also had
a first-hand look at many security problems that arise in large government
and military organizations.
Marianne
Winslett, University of Illinois
Marianne Winslett has been a professor at the University of Illinois since
1987. She started working on database security issues in the early 1990s,
focusing on semantic issues in MLS databases. Her interests soon shifted
to issues of trust management for data on the web. Trust negotiation is
her main current research focus.
Tutorial
3: Self-Managing Technology in Database Management Systems
Part 1: [Presentation
PPS] [Handout (2 slides/page)]
[Notes (3 slides/page with space
for note)]
Part 2: [Presentation
PPS] [Handout (2 slides/page)]
[Notes (3 slides/page with space
for note)]
Part 3: [Presentation
PPS] [Handout (2 slides/page)]
[Notes (3 slides/page with space
for note)]
Part 4: [Presentation
PPS] [Handout (2 slides/page)]
[Notes (3 slides/page
with space for note)]
Surajit Chaudhuri,
Benoit Dageville, Guy Lohman
Room
4, Wednesday 2:00-4:00 and 4:30-5:30
The rising cost of
labor relative to hardware and software means that the total cost of ownership
of information technology is increasingly dominated by people costs. In
response, all major providers of information technology are attempting
to make their products more self-managing. In this tutorial, we will introduce
the core concepts of self-managing technology and discuss their implications
for database management systems. We will review the state of the art of
self-managing technology for the IBM DB2, Microsoft SQL Server, and Oracle
products, and describe the wealth of research topics remaining to be solved.
Surajit
Chaudhuri, Microsoft Research
Surajit Chaudhuri leads the Data Management and Exploration Group at Microsoft
Research http://research.microsoft.com/dmx. In 1996, Surajit started the
AutoAdmin project at Microsoft Research to address the challenges in building
self-tuning database systems. This project developed novel automated index
tuning technology that shipped with SQL Server 7.0 in 1998. The AutoAdmin
project has since then continued to develop the self-tuning and self-manageability
technology further in collaboration with the Microsoft SQL Server product
team. Surajit's other project is Data Exploration, which looks at the
problem of querying and discovery of information in a flexible manner
information that spans text as well as relational data. Surajit is also
interested in the problems of data cleaning and integration. Surajit did
his Ph.D. from Stanford University in 1991 and worked at Hewlett-Packard
Laboratories, Palo Alto from 1991-1995. He has published widely in major
database conferences. http://research.microsoft.com/users/surajitc
Benoit Dageville, Oracle Corp.
Benoit Dageville is a consulting member in the Oracle Database Server
Technologies division at Oracle Corporation in Redwood Shores, California.
His main areas of expertise include parallel query processing, ETL (Extract
Transform and Load), large Data Warehouse benchmarks (e.g. TPC-H), SQL
execution and optimization. Since 1999, he is one of the lead architects
of the self-managing database initiative at Oracle. Major features resulting
from this effort include the Automatic SQL Memory Management (Oracle9i),
Automatic Workload Repository, Automatic Database Diagnostic Monitor and
Automatic SQL Tuning (Oracle10g). Dr Benoit Dageville graduated from the
University of Paris VI, France, in 1995 with a Ph.D. degree in computer
science, specializing in Parallel Database Management Systems under the
supervision of Dr. Patrick Valduriez. His research and industrial work
resulted in several refereed papers in international conferences and journals.
Guy M.
Lohman, IBM Almaden
Guy M. Lohman is Manager of Advanced Optimization in the Advanced Database
Solutions Department at IBM Research Division's Almaden Research Center
in San Jose, California, and has 22 years of experience in relational
query optimization at IBM. He is the architect of the Optimizer of the
DB2 Universal Data Base (UDB) for Linux, Unix, and Windows, and was responsible
for its development from 1992 to 1997. Prior to that, he was responsible
for the optimizers of the Starburst extensible object-relational and R*
distributed prototype DBMSs. More recently, Dr. Lohman co-invented and
designed the DB2 Index Advisor (predecessor to today's Design Advisor),
and in 2000 co-founded the DB2 Autonomic Computing project (formerly known
as SMART -- Self-Managing And Resource Tuning), now part of IBM's company-wide
Autonomic Computing initiative. In 2002, Dr. Lohman was elected to the
IBM Academy of Technology. Dr. Lohman received his Ph.D. in Operations
Research in 1976 from Cornell University. He is the author of over 40
papers in the refereed literature and the holder of 13 U.S. patents. His
current research interests involve query optimization and self-managing
database systems.
Tutorial
4: Architectures and Algorithms for Internet-Scale (P2P) Data Management
[Presentation
PPS] [Handout (2 slides/page)]
[Notes (3 slides/page with space
for note)]
Joseph M. Hellerstein
Room
4, Thursday 2:00-3:30 and Thursday 4:00-5:30
The database community
prides itself on scalable data management solutions. In recent years,
a new set of scalability challenges have arisen in the context of peer-to-peer
(p2p) systems on the Internet, in which the scaling metric is the number
of participating computers, rather than the number of bytes stored. The
best-known application of p2p technology to date has been file sharing,
but there are compelling new application agendas in Internet monitoring,
content distribution, distributed storage, multi-user games and next-generation
Internet routing. The energy behind p2p technology has led to a renaissance
in the distributed algorithms and distributed systems communities, much
of which directly addresses issues in massively distributed data management.
Moreover, many of these ideas have applicability beyond the "pure
p2p" context of Internet end-user systems, with ramifications for
any large-scale distributed system in which the scale of the system makes
traditional administrative models untenable. Internet-scale systems present
numerous unique technical challenges, including steady-state "churn"
(nodes joining and leaving), the need to evolve and scale without reconfiguration,
an absence of ongoing system administration, and adversarial participants
in the processing. In this tutorial, we will focus on key data management
building blocks including network indirection architectures, persistence
models, network embeddings of computations, resource management, and security/trust
challenges. We will also discuss motivations for the use of these technologies.
We will ground the presentation in experiences from a set of deployed
systems, and present open challenges that have arisen in this context.
Joseph M.
Hellerstein, University of California, Berkeley and Intel Research Berkeley
Joseph M. Hellerstein is a Professor of Computer Science at the University
of California, Berkeley, and is the Director of Intel Research, Berkeley.
He is an Alfred P. Sloan Research Fellow, and a recipient of multiple
awards, including ACM-SIGMOD's "Test of Time" award for his
first published paper, NSF CAREER, NASA New Investigator, Okawa Foundation
Fellowship, and IBM's Best Paper in Computer Science. In 1999, MIT's Technology
Review named him one of the top 100 young technology innovators worldwide
in their inaugural "TR100" list. Hellerstein's research focuses
on data management and movement, including database systems, sensor networks,
peer-to-peer and distributed systems. Prior to his position at Intel Research,
Hellerstein was a co-founder of Cohera Corporation (now part of PeopleSoft),
where he served as Chief Scientist from 1998-2001. Key ideas from his
research have been incorporated into commercial and open-source database
systems including IBM's DB2 and Informix, PeopleSoft's Catalog Management,
and the open-source PostgreSQL system. Hellerstein currently serves on
the technical advisory boards of a number of software companies, and has
served as a member of the advisory boards of ACM SIGMOD and Ars Digita
University. Hellerstein received his Ph.D. at the University of Wisconsin,
Madison, a masters degree from UC Berkeley, and a bachelor's degree from
Harvard College. He spent a pre-doctoral internship at IBM Almaden Research
Center, and a post-doctoral internship at the Hebrew University in Jerusalem.
Tutorial
5. The Continued Saga of DB-IR Integration
[Presentation
PDF] [Handout
(2 slides/page)] [Notes
(3 slides/page with space for note)]
Ricardo Baeza-Yates
and Mariano Consens
Confederation
5 & 6, Friday 9:00-10:30 and 11:00-12:30
The world of data
has been developed from two main points of view: the structured relational
data model and the unstructured text model. The two distinct cultures
of database and information retrieval now have a natural meeting place
in the Web with its semi-structured XML model. As web-style searching
becomes the ubiquitous tool, the need for integrating these two viewpoints
becomes even more important. In this tutorial we explore the differences,
the problems and the techniques for DB-IR integration for a range of applications.
The tutorial will provide an overview of the different approaches put
forward by the IR and DB communities and survey the DB-IR integration
efforts. Both earlier proposals as well as recent ones (in the context
of XML in particular) will be discussed. A variety of application scenarios
for DB-IR integration will be covered. The objective of this tutorial
is to provide an overview of the issues and approaches developed for the
integration of database and information retrieval systems. The target
audience of this tutorial includes researchers in database systems, as
well as developers of Web and database/information retrieval applications.
Ricardo Baeza-Yates, University of Chile
Ricardo Baeza-Yates is a professor at the Computer Science Department
of the University of Chile, where he was the chair between 1993-95. He
is also the director of the Center for Web Research, a project funded
by the Millenium Scientific Initiative of the Ministry of Planning. He
received the bachelor degree in Computer Science (CS) in 1983 from the
University of Chile. Later, in 1985, he received also the M.Sc. in computer
science and the professional title in electrical engineering (EE). One
year later, he obtained M.Eng. in EE from the same university. He received
his Ph.D. in CS from the University of Waterloo, Canada, in 1989, doing
a six months post-doctoral position the same year. His research interests
include information retrieval, algorithms, and information visualization.
He is co-author of the book Modern Information Retrieval, published in
1999 by Addison-Wesley, as well as co-author of the 2nd edition of the
Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and
co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall,
1992, between other publications in journals published by ACM, IEEE, SIAM,
etc. Ricardo received, in 1993, the Organization of American States award
for young researchers in exact sciences. In 1994 he received the award
to the best engineering research in the last 4 years from the Institute
of Engineers of Chile, and was invited by the U.S.A. Presidential Office
for a one month scientific tour in that country. In 1996, he won a scholarship
from the Education and Science Ministry of Spain to have a sabbatical
year at the Polytechnic Univ. of Catalunya. In 1997 with two Brazilian
colleagues obtained the COMPAQ prize to the best Brazilian research article
in computer science. As a Ph.D. student he won several scholarships including
the Ontario Graduate scholarship, the Institute for Computer Research
scholarship for graduate studies, Information Technology and Research
Centre graduate scholarship, the Univ. of Waterloo graduate student award,
and the Department of Computer Science Fellowship. Ricardo's professional
activities include Presidency of the Chilean Computer Science Society
(SCCC) between 1992-1995 and 1997-1998. From 1998-2000 he was in charge
of the IEEE-CS chapter in Chile and has been involved in the South American
ACM Programming Contest since 1998. He is currently the president of CLEI,
a Latin American association of CS departments; and coordinates the Iberoamerican
cooperation program in Electronics and Informatics. He was recently elected
to the IEEE CS Board of Governors for the period 2002-04. In 2002 he was
appointed to the Chilean Academy of Sciences, being the first person from
computer science to achieve this position in Chile.
Mariano Consens, University of Toronto
Mariano Consens is a faculty member in Information Engineering at the
MIE Department, University of Toronto, which he joined in 2003. Before
that, he was research faculty at the School of Computer Science, University
of Waterloo, from 1994 to 1999. He received his PhD and MSc degrees in
Computer Science from the University of Toronto. He also holds a Computer
Systems Engineer degree from the Universidad de la Republica, Uruguay.
Mariano's research interests are in the areas of Data Management Systems
and the Web, with a current focus on XML searching, autonomic systems
and pervasive computing. He has over 20 publications and two patents,
including journal publications selected from best conference papers. In
addition to his academic positions, he has been active in the software
industry as a founder and Director of Freedom Intelligence (a query engine
provider), as the CTO of Classwave Wireless (a Bluetooth software infrastructure
company) and as a technology advisor for Xign (an electronic payment systems
supplier), OpenText (an early web search engine turned knowledge management
software vendor) and others.
|