|
Keynotes
We are pioneering a new collaborative style of keynote address with industrial speakers paired with academic speakers. The speakers are developing their talks with coordination, each in their own way.
On Tuesday, the keynote addresses are:
The Impact of Columnar In-Memory Databases on Enterprise Systems
| Abstract
Five years ago I proposed a common database approach for transaction processing and analytical systems using a columnar in-memory database, disputing the common belief that column stores are not suitable for transactional workloads. Today, the concept has been widely adopted in academia and industry and it is proven that it is feasible to run analytical queries on large data sets directly on a redundancy-free schema, eliminating the need to maintain pre-built aggregate tables during data entry transactions. The resulting reduction in transaction complexity leads to a dramatic simplification of data models and applications, redefining the way we build enterprise systems. First analyses of productive applications adopting this concept confirm that system architectures enabled by in-memory column stores are conceptually superior for business transaction processing compared to row-based approaches. Additionally, our analyses show a shift of enterprise workloads to even more read-oriented processing due to the elimination of updates of transaction-maintained aggregates.
Presenter
Hasso Plattner
Bio
Prof. Dr. h.c. mult. Hasso Plattner is one of the co-founders of SAP
AG and has been Chairman of the Supervisory Board since May 2003. In
this role and as Chief Software Advisor, he concentrates on defining
the medium- and longterm technology strategy and direction of SAP. He
also heads the Technology Committee of the SAP Supervisory Board.
Hasso Plattner received his Master’s Degree in Communications
Engineering from the University of Karlsruhe. In 1990, the University
of Saarbrücken awarded him an honorary doctorate and in 1994, he was
granted an honorary full professorship. In 1997, as chairman of SAP
America, Inc., co-chairman of SAP and the chief architect of SAP R/3,
Hasso Plattner received the Information Technology Leadership Award
for Global Integration as part of the Computerworld Smithsonian Awards
Program. In 1998, he was inducted into the German Hall of Fame. In
2002, Hasso Plattner was appointed Honorary Doctor, and in 2004
Honorary Professor by the University of Potsdam. Hasso Plattner also
founded the Hasso Plattner Institute (HPI) for IT Systems Engineering
at the University of Potsdam in 1998 with the largest single private
contribution to a university ever made in Germany. Through his
continuing financial support, he is helping the HPI in its efforts to
become a center for the education of world-class software specialists.
|
Breaking the Chains: On Declarative Data Analysis and Data Independence in the Big Data Era
| Abstract
Data management research, systems, and technologies have drastically improved the availability of data analysis capabilities, particularly for non-experts, due in part to low-en-try barriers and reduced ownership costs (e.g., for data management infrastructures and applications). Major reasons for the widespread success of database systems and today’s multi-billion dollar data management market include data independence, separating physical representation and storage from the actual information, and declarative languages, separating the program specification from its intended execution environment. In contrast, today’s big data solutions do not offer data independence and declarative specification. As a result, big data technologies are mostly employed in newly-established companies with IT-savvy employees or in large well-established companies with big IT departments. We argue that current big data solutions will continue to fall short of widespread adoption, due to usability problems, despite the fact that in-situ data analytics technologies achieve a good degree of schema independence. In particular, we consider the lack of a declarative specification to be a major road-block, contributing to the scarcity in available data scientists available and limiting the application of big data to the IT-savvy industries. In particular, data scientists currently have to spend a lot of time on tuning their data analysis programs for specific data characteristics and a specific execution environment. We believe that the research com-munity needs to bring the powerful concepts of declarative specification to current data analysis systems, in order to achieve the broad big data technology adoption and effectively deliver the promise that novel big data technologies offer.
Presenter
Volker Markl
Bio
Volker Markl is a Full Professor and Chair of the Database Systems and
Information Management (DIMA) group at the Technische Universitat
Berlin (TU Berlin)
as well as an adjunct status-only professor at the University of
Toronto. Earlier in his career, Dr. Markl lead a
research group at FORWISS, the Bavarian Research Center for
Knowledge-based Systems in Munich, Germany, and was a Research Staff
member & Project Leader at the IBM Almaden Research Center in San
Jose, California, USA. Dr. Markl has published numerous research
papers
on indexing, query optimization, lightweight information integration,
and scalable data processing. He holds 7 patents, has transferred
technology into
several commercial products, and advises several companies and
startups. He has been speaker and principal investigator of the
Stratosphere research
project that resulted in the "Apache Flink" big data analytics system.
Dr. Markl currently serves as the secretary of the VLDB Endowment and
was recently elected
as one of Germany's leading "digital minds" (Digitale Köpfe) by the
German Informatics Society (GI).
|
On Wednesday, the keynote addresses are:
Datacenters as Computers: Google Engineering & Database Research Perspectives
|
Abstract
In this collaborative keynote address, we will share Google's experience in building a scalable data infrastructure that leverages datacenters for managing Google's advertising data over the last decade. In order to support the massive online advertising platform at Google, the data infrastructure must simultaneously support both transactional and analytical workloads. The focus of this talk will be to highlight how the datacenter architecture and the cloud computing paradigm has enabled us to manage the exponential growth in data volumes and user queries, make our services highly available and fault tolerant to massive datacenter outages, and deliver results with very low latencies. We note that other Internet companies have also undergone similar growth in data volumes and user queries. In fact, this phenomenon has resulted in at least two new terms in the technology lexicon: big data and cloud computing. Cloud computing (and datacenters) have been largely responsible for scaling the data volumes from terabytes range just a few years ago to now reaching in the exabyte range over the next couple of years. Delivering solutions at this scale that are fault-tolerant, latency sensitive, and highly available requires a combination of research advances with engineering ingenuity at Google and elsewhere. Next, we will try to answer the following question: is a datacenter just another (very large) computer? Or, does it fundamentally change the design principles for data-centric applications and systems. We will conclude with some of the unique research challenges that need to be addressed in order to sustain continuous growth in data volumes while supporting high throughput and low latencies.
Presenter
Shivakumar Venkataraman and Divyakant Agrawal
Bio
Shivakumar Venkataraman is Vice President of Engineering for Google's
Advertising Infrastructure and Payments Systems. He received his BS in
Computer Science from IIT, Madras in 1990 and received his MS and PhD
in Computer Science from University of Wisconsin at Madison in 1991
and 1996 respectively. From 1996 to 2000, he worked as an Advisory
Software Engineer with IBM working on the development of IBM's
federated query optimizers and associated technologies. After leaving
IBM in 2000, he worked with Cohera Corporation, PeopleSoft, Required
Technologies, and AdeSoft. He also served as a Visiting Faculty member
at UC Berkeley in 2002. He has been with Google since 2003 . At
Google, Dr. Venkataraman is recognized for the vision in the
development of critical technologies for databases: scalable
distributed database management system F1, scalable data warehousing
solution Mesa, scalable log-processing system Photon, among others.
Divyakant Agrawal is a Professor of Computer Science and the Director
of Engineering Computing Infrastructure at the University of
California at Santa Barbara. His research expertise is in the areas of
database systems, distributed computing, data warehousing, and
large-scale information systems. Divy Agrawal is an ACM Distinguished
Scientist (2010), an ACM Fellow (2012), and an IEEE Fellow (2012). His
current interests are in the areas of scalable data management and data
analysis in cloud computing environments, security and privacy of data
in the cloud, and scalable analytics over social networks data and
social media. In 2013-14, he was on a sabbatical leave from UCSB and
served as a Visiting Scientist in the Advertising Infrastructure
Group at Google, Inc. in Mountain View, CA. In 2014-15, he will be on
leave from UCSB and will serve as a Director of Research in Data
Analytics at Qatar Computing Research Institute.
|
On Thursday, the keynote addresses are by the winners of our major awards.
Best Paper Award
-
Felix Martin Schuhknecht, Alekh Jindal, Jens Dittrich:
The Uncracked Pieces in Database Cracking. 97-108.
-
Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, Sai Wu:
epiC: an Extensible and Scalable System for Processing Big Data. 541 - 552.
-
Uwe Jugel, Zbigniew Jerzak, Gregor Hackenbroich, Volker Markl:
M4: A Visualization-Oriented Time Series Data Aggregation. 797 - 808.
-
Yannis Klonatos, Christoph Koch, Tiark Rompf, Hassan Chafi:
Building Efficient Query Engines in a High-Level Language. 853 - 864.
-
Stefan Funke, Andre Nusser, Sabine Storandt:
On k-Path Covers and their Applications. 893 - 902.
Test-of-time Award: Probabilistic Databases: the Long View
| Abstract
Ten years ago, we studied how to evaluate queries on
probabilistic databases. We found something we did not expect: some
queries were easy, some were provably hard, and, in restricted cases,
one could draw precisely the boundary between them. We called the
former queries "safe", the latter "unsafe", and presented these
findings in the paper "Efficient Query Evaluation on Probabilistic
Databases".
Subsequent work by several researchers, including ourselves, have
significantly expanded the boundary of safe/unsafe queries to richer
languages. Today, ten years later, probabilistic inference over big
data sets is becoming a central data management problem, due to the
success of large knowledge bases extracted automatically from the Web.
One of the most promising approaches to scalable inference is based on
the safe plans that we introduced ten years ago. In this talk, we will
present our view on how the field has evolved, and what the future
holds for Probabilistic Databases.
Presenter
Nilesh Dalvi and Dan Suciu
Bio
Dr. Nilesh Dalvi is a Database researcher whose research spans
several areas of Data Management including Managing Uncertainty in
Databases, Information Integration, Information Extraction and
Crowdsourcing. He received his B.S. in Computer Science from IIT
Bombay, Mumbai, in 2001, and his Ph.D. from University of Washington,
Seattle in 2007. Dr. Dalvi has held the positions of Research
Scientist at Yahoo! and Facebook. In 2008, he received an honorable
mention for the ACM SIGMOD Best Dissertation Award. He has served as a
Program Chair for the International Conference on Very Large
Databases, 2012, and has published over 50 articles in the field of
Data Management. Dr. Dalvi currently serves as the Chief Scientist at
Trooly Inc.
Dan Suciu is a Professor in Computer Science at the University of Washington. He received his Ph.D. from the University of Pennsylvania in 1995, was a principal member of the technical staff at AT&T Labs and joined the University of Washington in 2000. Suciu is conducting research in data management, with an emphasis on topics related to Big Data and data sharing, such as probabilistic data, data pricing, parallel data processing, data security. He is a co-author of two books Data on the Web: from Relations to Semistructured Data and XML, 1999, and Probabilistic Databases, 2011. He is a Fellow of the ACM, holds twelve US patents, received the best paper award in SIGMOD 2000 and ICDT 2013, the ACM PODS Alberto Mendelzon Test of Time Award in 2010 and in 2012, the 10 Year Most Influential Paper Award in ICDE 2013, and is a recipient of the NSF Career Award and of an Alfred P. Sloan Fellowship. Suciu serves on the VLDB Board of Trustees, and is an associate editor for the VLDB Journal, ACM TOIS, ACM TWEB, and Information Systems and is a past associate editor for ACM TODS. Suciu's PhD students Gerome Miklau and Christopher Re received the ACM SIGMOD Best Dissertation Award in 2006 and 2010 respectively, and Nilesh Dalvi was a runner up in 2008.
|
Engineering HighPerformance Database Engines
| Abstract
Developing a database engine is both challenging and rewarding. Database engines are very complex software artifacts that have to scale to large data sizes and large hardware configurations, and developing such systems usually means choosing between different trade-offs at various points of development. This paper gives a survey over two different database engines, the disk-based SPARQL-processing engine RDF-3X, and the relational main-memory engine HyPer. It discusses the design choices that were made during development, and highlights optimization techniques that are important for both systems.
Presenter
Thomas Neumann
Bio
Thomas Neumann conducts research on database systems, focusing on query
optimization and query processing. As part of that research he has build
two very successful systems, RDF-3X for efficient processing of large
RDF data, and the very fast main-memory database system HyPer. Their
development induced many innovative techniques, including advanced join
ordering techniques and efficient query compilation approaches.
He studied business information systems at the University of Mannheim
and received a doctorate in informatics from the same university.Before
joining TUM as professor, Neumann was a senior researcher at the Max
|
|
|
|
|