PANEL
PROGRAM
|
| Tuesday, August
20: Panel
1 (Brodie) |
Wednesday, August 21: Panel
2 (Freytag) |
Friday, August 23: Panel
3 (Cushing)
|
|
|
PANEL 1: TUESDAY, 20 AUGUST
2002, 16:30-18:00
|
Data Management
Challenges in Very Large
Enterprises
Panelists
- Adam Bosworth,
VP, Engineering, BEA Systems, Inc.,
U.S.A.
- James Hamilton,
Architect, Microsoft SQL Server, Microsoft
Corp., U.S.A. (PDF
Presentation
Slides
- 324K; PowerPoint
Presentation
Slides
- 364K)
- Pat Selinger,
IBM Fellow and VP, Data Management Architecture
and Technology, IBM, U.S.A. (PDF
Presentation
Slides
- 124K)
- Hans-Peter
Steiert, Research & Technology,
DaimlerChrysler AG, Germany (PDF
Presentation
Slides
- 624K)
|
|
Panel
Outline
Very large
enterprises have approximately a petabyte of
operational data stored in over 1,000 data
repositories supporting over 5,000 applications.
Data storage volumes grow in excess of 50%
annually. Repositories for decision support
systems, which often contain replicated data, grow
at twice to three times as fast as databases used
for online transaction processing (OLTP). OLTP
workloads are growing at over 60% per year. This
growth is expected to continue for some time due to
new Web-based systems, increased accesses to
existing systems and the introduction of new
sources of data, new workloads, and, new (e.g.,
XML-based) access requirements. While dealing with
massive growth, large enterprises must also address
unpredictable or elastic access demand of
constantly evolving Web-based systems, increased
storage complexity, new storage technologies (e.g.,
network data storage over IP, storage utilities),
and more conventional but increasingly complex and
costly challenges of data and storage management
(e.g., backup and recovery).
While data
management and data storage technologies continue
to make impressive advances, there is only so much
they can do in the face of the predicted growth
rates. Very large enterprises are attempting to
identify and address the drivers of data growth. A
leading candidate is integration. Recent analyst
studies conclude that over 40% of IT budgets are
devoted to the integration of new and existing
systems and databases. Technology advances often
manifest in new systems and databases rather than
in improvements and enhancements to existing
systems. Consequently, very large enterprises
operate their businesses with 1,000's of systems
and databases ranging in age from 6 weeks to 30
years. Operational efficiencies require that these
systems be integrated. The Web's potential of
universal access adds increased urgency to these
challenges. As a result, very large enterprises
deal not only with massive data and workload growth
and the attendant management activities, but also
with massive integration challenges and
costs.
Solution providers
continue to offer significant advances to deal with
specific data storage and data management problems
(e.g., availability, robustness, performance) and
are beginning to turn their attention to the
integration challenge.
Current solutions
tend not to map directly to the problems of very
large enterprises. Solutions are seldom
comprehensive and are product or vendor specific.
Three approaches to address the problem of
integrating component solutions into an enterprise
solution are standards, consultants, and integrated
product suites. Standards, such as those for Web
Services, are intended to provide common
specifications for all products in a domain so that
different vendor products can be readily
integrated. Consultants are intended to be vendor
neutral while bringing a wealth of experience and
knowledge to multi-vendor problems. A third
approach is for vendors to integrate their products
into tightly integrated product suites. Each
approach has severe limitations. Very large
enterprises are looking more than ever to solution
providers to assist with their massive data
management challenges.
The panel will
identify the dominant data management challenges
facing very large enterprises from the perspective
of problem owners and will explore the solutions
being offered by leading solution owners. It will
discuss specific VLDB challenges and how the
solutions address the challenges.
|
|
|
|
PANEL 2: WEDNESDAY, 21 AUGUST
2002, 16:30-18:00
|
The Future Home
of Data
Panelists
- Michael
Franklin, University of California, Berkeley,
U.S.A. (PDF
Presentation
Slides
- 424K)
- Paul Larson,
Microsoft Research, U.S.A. (PDF
Presentation
Slides
- 20K)
- Guy Lohman, IBM
Almaden Research, U.S.A. (PDF
Presentation
Slides
- 44K)
- Guido
Moerkotte, Universität Mannheim, Germany
(PDF
Presentation
Slides
- 60K)
|
|
Panel
Outline
Over the last year
the question of how and where to store data and how
to access it has become a pressing issue.
Especially with the growing importance of
E-Commerce and the widely used Internet access, it
is not clear any more if one approach satisfies all
the needs of different communities accessing and
processing data.
XML currently seems
to be the most favored form for presenting,
exchanging and possibly storing data. New
technology "waves" such as Web Services, an
"interaction model" between businesses and
customers (B2C) or among business themselves (B2B),
heavily rely on XML data because semantic
information can be included with the data itself.
Many companies in the E-Commerce space assume XML
to be the "universal data format" for the future
when storing and accessing data.
Database management
systems (DBMSs) have been the "modern" technology
for the last 40 years (with different data models
and different levels of functionality and
sophistication) to store, to access and to
manipulate large amounts of data. This technology
comes with many properties embedded in subsystems
of DBMSs that is essential for reliable, efficient
information processing in the business world, such
as transaction management and query processing.
Over the years, DBMSs have changed their
assumptions where data might reside for access and
processing. The initial assumption that all data is
stored centrally changed to a "distributed model"
by assuming that data is spread over several
locations. The latter model quickly extended into a
federated one assuming that "data sources" might be
autonomous and not always under the control of one
(database) system at the same time acknowledging
that data might come in different
formats.
Despite the "new
technology waves" one has to acknowledge that much
data is still stored as "flat files" without
structural information or other relevant "meta
data" that might be important for efficient access
and correct processing.
Besides the form of
the data, its location, i.e., where to expect data
that might be relevant and important for a user to
perform a particular task, has become an important
issue. Having many devices such as mobile phones,
notebooks, PDAs and other mobile devices, those
often need data from other sources, at the same
time storing new/additional data that might be
relevant for others to perform their tasks
successfully. Some devices, such as mobile phone,
exist in large numbers; they perform tasks
different from a computing device. Still their
capabilities as storage processing devices (limited
capabilities) make them an important data
generation source and storage device that must be
included into current and future processing
environments.
Panel members from
industry and academia have been asked to address
the following issues in their
statements:
- What are the
different alternatives for storing and accessing
data for the future?
- What are the
characteristics for storage now and in the
future?
- Is XML the
universal answer to all future needs, does it
take over the "world of data"?
- Should existing
DBMSs be thrown away?
- Should existing
DBMSs be changed to adapt to new requirements
and challenges?
- Are XML-DBMSs
the answer to all needs?
- Is the
role/functionality of DBMSs
changing?
- How do we deal
with the many (mobile) devices as devices for
data storage and data processing?
- What are the
"right" assumptions about distribution and
heterogeneity of data?
- What is the
industrial and/or academic approach to deal with
this challenge?
- What are the
trade-offs between different
approaches?
|
|
|
|
PANEL 3: FRIDAY, 23 AUGUST 2002,
11:00-13:00
|
Biodiversity
and Ecosystem Informatics - Research, Technology
Transfer, or Application
Development?
Panel
Summary
- 156K PDF file
Panelists
- Kathleen
Bergen, University of Michigan, U.S.A.
(PDF
Presentation
Slides
- 708K)
- Yannis E.
Ioannidis, University of Athens, Greece
(PDF
Presentation
Slides
- 32K)
- Jessie Kennedy,
Napier University, U.K. (PDF
Presentation
Slides
- 768K)
- Renée J.
Miller, University of Toronto, Canada
(PDF
Presentation
Slides
- 580K)
|
|
Panel
Outline
At VLDB 2000, a
keynote speaker and a panel session urged database
researchers to help solve critical problems in
biodiversity informatics. A subsequent Spring 2001
report of a National Science Foundation Panel on
Biodiversity and Ecosystem Informatics (BDEI)
suggested that next-generation CS/IT applications
required to understand complex, ecosystem-scale
processes would require significant,
ground-breaking CS/IT research . Considerable
interest in this application domain has
materialized, but as Maria Zemankova, NSF Program
Officer, reported to the BDEI Workshop organizing
committee after a talk at the National Library of
Medicine: definitions of key terms and the list of
research issues were not "jumping out". Some have
suggested that the BDEI report is on the "nice"
side rather than "hard-core". In short, before
researchers or funding agencies get involved, they
want to know more detail about how current research
would transfer to this domain, and whether there
are genuine research issues, or "just" complex
application development. I think that, at the time
the BDEI report was written, we knew only that
there were domain problems, not what original
research might be required. We could identify
research areas, but not articulate research topics.
In fact, one might ask whether these (admittedly
critical) domain problems need:
- original
research
- off the shelf
applications or different (general)
DBMS
- organizational
infrastructure for supporting data
re-use
- training for
ecologists to use technology that already
exists
- more ecologists
to do the domain research
It is, however, not
possible to simply look at these problems and
decide whether research is required to solve them
-- one must actually try. Thus, in the summer of
2001, the NSF granted $1.25 million in grants to
fifteen research groups to initiate planning
projects and research initiation efforts in BDEI.
By summer of 2002, these projects should be in a
position to determine whether the project they have
outlined in the proposal requires original research
and if so, what is the nature of that
research.
This panel will
report back to the VLDB on the nature of research
issues in this domain. Using one or more concrete
examples of BDEI (funded) research, panelists will
present their view of where the work lies with
respect to database issues (e.g., conceptual
modeling, spatial and temporal databases, metadata,
data mining, data integration) and whether the work
constitutes 1) work in the domain by ecologists,
e.g., training in existing technology,
infrastructure for
community databases or to support reuse, or more
ecologists doing field work, etc., 2) applying
existing DBMS technology, 3) applying existing DBMS
research to create new technology, or 4) original
DBMS (or other CS) research.
|
|
|