VLDB2014 - Keynotes

We are pioneering a new collaborative style of keynote address with industrial speakers paired with academic speakers. The speakers are developing their talks with coordination, each in their own way.

On Tuesday, the keynote addresses are:

The Impact of Columnar In-Memory Databases on Enterprise Systems

Hasso Plattner

Abstract
Five years ago I proposed a common database approach for transaction processing and analytical systems using a columnar in-memory database, disputing the common belief that column stores are not suitable for transactional workloads. Today, the concept has been widely adopted in academia and industry and it is proven that it is feasible to run analytical queries on large data sets directly on a redundancy-free schema, eliminating the need to maintain pre-built aggregate tables during data entry transactions. The resulting reduction in transaction complexity leads to a dramatic simplification of data models and applications, redefining the way we build enterprise systems. First analyses of productive applications adopting this concept confirm that system architectures enabled by in-memory column stores are conceptually superior for business transaction processing compared to row-based approaches. Additionally, our analyses show a shift of enterprise workloads to even more read-oriented processing due to the elimination of updates of transaction-maintained aggregates.

Presenter
Hasso Plattner

Bio
Prof. Dr. h.c. mult. Hasso Plattner is one of the co-founders of SAP AG and has been Chairman of the Supervisory Board since May 2003. In this role and as Chief Software Advisor, he concentrates on defining the medium- and longterm technology strategy and direction of SAP. He also heads the Technology Committee of the SAP Supervisory Board. Hasso Plattner received his Master’s Degree in Communications Engineering from the University of Karlsruhe. In 1990, the University of Saarbrücken awarded him an honorary doctorate and in 1994, he was granted an honorary full professorship. In 1997, as chairman of SAP America, Inc., co-chairman of SAP and the chief architect of SAP R/3, Hasso Plattner received the Information Technology Leadership Award for Global Integration as part of the Computerworld Smithsonian Awards Program. In 1998, he was inducted into the German Hall of Fame. In 2002, Hasso Plattner was appointed Honorary Doctor, and in 2004 Honorary Professor by the University of Potsdam. Hasso Plattner also founded the Hasso Plattner Institute (HPI) for IT Systems Engineering at the University of Potsdam in 1998 with the largest single private contribution to a university ever made in Germany. Through his continuing financial support, he is helping the HPI in its efforts to become a center for the education of world-class software specialists.

Breaking the Chains: On Declarative Data Analysis and Data Independence in the Big Data Era

Volker Markl

Abstract
Data management research, systems, and technologies have drastically improved the availability of data analysis capabilities, particularly for non-experts, due in part to low-en-try barriers and reduced ownership costs (e.g., for data management infrastructures and applications). Major reasons for the widespread success of database systems and today’s multi-billion dollar data management market include data independence, separating physical representation and storage from the actual information, and declarative languages, separating the program specification from its intended execution environment. In contrast, today’s big data solutions do not offer data independence and declarative specification. As a result, big data technologies are mostly employed in newly-established companies with IT-savvy employees or in large well-established companies with big IT departments. We argue that current big data solutions will continue to fall short of widespread adoption, due to usability problems, despite the fact that in-situ data analytics technologies achieve a good degree of schema independence. In particular, we consider the lack of a declarative specification to be a major road-block, contributing to the scarcity in available data scientists available and limiting the application of big data to the IT-savvy industries. In particular, data scientists currently have to spend a lot of time on tuning their data analysis programs for specific data characteristics and a specific execution environment. We believe that the research com-munity needs to bring the powerful concepts of declarative specification to current data analysis systems, in order to achieve the broad big data technology adoption and effectively deliver the promise that novel big data technologies offer.

Presenter
Volker Markl

Bio
Volker Markl is a Full Professor and Chair of the Database Systems and Information Management (DIMA) group at the Technische Universitat Berlin (TU Berlin) as well as an adjunct status-only professor at the University of Toronto. Earlier in his career, Dr. Markl lead a research group at FORWISS, the Bavarian Research Center for Knowledge-based Systems in Munich, Germany, and was a Research Staff member & Project Leader at the IBM Almaden Research Center in San Jose, California, USA. Dr. Markl has published numerous research papers on indexing, query optimization, lightweight information integration, and scalable data processing. He holds 7 patents, has transferred technology into several commercial products, and advises several companies and startups. He has been speaker and principal investigator of the Stratosphere research project that resulted in the "Apache Flink" big data analytics system. Dr. Markl currently serves as the secretary of the VLDB Endowment and was recently elected as one of Germany's leading "digital minds" (Digitale Köpfe) by the German Informatics Society (GI).

On Wednesday, the keynote addresses are:

Datacenters as Computers: Google Engineering & Database Research Perspectives

Shivakumar Venkataraman

Divyakant Agrawal

Abstract
In this collaborative keynote address, we will share Google's experience in building a scalable data infrastructure that leverages datacenters for managing Google's advertising data over the last decade. In order to support the massive online advertising platform at Google, the data infrastructure must simultaneously support both transactional and analytical workloads. The focus of this talk will be to highlight how the datacenter architecture and the cloud computing paradigm has enabled us to manage the exponential growth in data volumes and user queries, make our services highly available and fault tolerant to massive datacenter outages, and deliver results with very low latencies. We note that other Internet companies have also undergone similar growth in data volumes and user queries. In fact, this phenomenon has resulted in at least two new terms in the technology lexicon: big data and cloud computing. Cloud computing (and datacenters) have been largely responsible for scaling the data volumes from terabytes range just a few years ago to now reaching in the exabyte range over the next couple of years. Delivering solutions at this scale that are fault-tolerant, latency sensitive, and highly available requires a combination of research advances with engineering ingenuity at Google and elsewhere. Next, we will try to answer the following question: is a datacenter just another (very large) computer? Or, does it fundamentally change the design principles for data-centric applications and systems. We will conclude with some of the unique research challenges that need to be addressed in order to sustain continuous growth in data volumes while supporting high throughput and low latencies.

Presenter
Shivakumar Venkataraman and Divyakant Agrawal

Bio
Shivakumar Venkataraman is Vice President of Engineering for Google's Advertising Infrastructure and Payments Systems. He received his BS in Computer Science from IIT, Madras in 1990 and received his MS and PhD in Computer Science from University of Wisconsin at Madison in 1991 and 1996 respectively. From 1996 to 2000, he worked as an Advisory Software Engineer with IBM working on the development of IBM's federated query optimizers and associated technologies. After leaving IBM in 2000, he worked with Cohera Corporation, PeopleSoft, Required Technologies, and AdeSoft. He also served as a Visiting Faculty member at UC Berkeley in 2002. He has been with Google since 2003 . At Google, Dr. Venkataraman is recognized for the vision in the development of critical technologies for databases: scalable distributed database management system F1, scalable data warehousing solution Mesa, scalable log-processing system Photon, among others.

Divyakant Agrawal is a Professor of Computer Science and the Director of Engineering Computing Infrastructure at the University of California at Santa Barbara. His research expertise is in the areas of database systems, distributed computing, data warehousing, and large-scale information systems. Divy Agrawal is an ACM Distinguished Scientist (2010), an ACM Fellow (2012), and an IEEE Fellow (2012). His current interests are in the areas of scalable data management and data analysis in cloud computing environments, security and privacy of data in the cloud, and scalable analytics over social networks data and social media. In 2013-14, he was on a sabbatical leave from UCSB and served as a Visiting Scientist in the Advertising Infrastructure Group at Google, Inc. in Mountain View, CA. In 2014-15, he will be on leave from UCSB and will serve as a Director of Research in Data Analytics at Qatar Computing Research Institute.

On Thursday, the keynote addresses are by the winners of our major awards.

Best Paper Award

Felix Martin Schuhknecht, Alekh Jindal, Jens Dittrich:
The Uncracked Pieces in Database Cracking. 97-108.

Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, Sai Wu:
epiC: an Extensible and Scalable System for Processing Big Data. 541 - 552.

Uwe Jugel, Zbigniew Jerzak, Gregor Hackenbroich, Volker Markl:
M4: A Visualization-Oriented Time Series Data Aggregation. 797 - 808.

Yannis Klonatos, Christoph Koch, Tiark Rompf, Hassan Chafi:
Building Efficient Query Engines in a High-Level Language. 853 - 864.

Stefan Funke, Andre Nusser, Sabine Storandt:
On k-Path Covers and their Applications. 893 - 902.

Test-of-time Award: Probabilistic Databases: the Long View

Nilesh Dalvi

Dan Suciu

Abstract
Ten years ago, we studied how to evaluate queries on probabilistic databases. We found something we did not expect: some queries were easy, some were provably hard, and, in restricted cases, one could draw precisely the boundary between them. We called the former queries "safe", the latter "unsafe", and presented these findings in the paper "Efficient Query Evaluation on Probabilistic Databases". Subsequent work by several researchers, including ourselves, have significantly expanded the boundary of safe/unsafe queries to richer languages. Today, ten years later, probabilistic inference over big data sets is becoming a central data management problem, due to the success of large knowledge bases extracted automatically from the Web. One of the most promising approaches to scalable inference is based on the safe plans that we introduced ten years ago. In this talk, we will present our view on how the field has evolved, and what the future holds for Probabilistic Databases.

Presenter
Nilesh Dalvi and Dan Suciu

Bio
Dr. Nilesh Dalvi is a Database researcher whose research spans several areas of Data Management including Managing Uncertainty in Databases, Information Integration, Information Extraction and Crowdsourcing. He received his B.S. in Computer Science from IIT Bombay, Mumbai, in 2001, and his Ph.D. from University of Washington, Seattle in 2007. Dr. Dalvi has held the positions of Research Scientist at Yahoo! and Facebook. In 2008, he received an honorable mention for the ACM SIGMOD Best Dissertation Award. He has served as a Program Chair for the International Conference on Very Large Databases, 2012, and has published over 50 articles in the field of Data Management. Dr. Dalvi currently serves as the Chief Scientist at Trooly Inc.

Dan Suciu is a Professor in Computer Science at the University of Washington. He received his Ph.D. from the University of Pennsylvania in 1995, was a principal member of the technical staff at AT&T Labs and joined the University of Washington in 2000. Suciu is conducting research in data management, with an emphasis on topics related to Big Data and data sharing, such as probabilistic data, data pricing, parallel data processing, data security. He is a co-author of two books Data on the Web: from Relations to Semistructured Data and XML, 1999, and Probabilistic Databases, 2011. He is a Fellow of the ACM, holds twelve US patents, received the best paper award in SIGMOD 2000 and ICDT 2013, the ACM PODS Alberto Mendelzon Test of Time Award in 2010 and in 2012, the 10 Year Most Influential Paper Award in ICDE 2013, and is a recipient of the NSF Career Award and of an Alfred P. Sloan Fellowship. Suciu serves on the VLDB Board of Trustees, and is an associate editor for the VLDB Journal, ACM TOIS, ACM TWEB, and Information Systems and is a past associate editor for ACM TODS. Suciu's PhD students Gerome Miklau and Christopher Re received the ACM SIGMOD Best Dissertation Award in 2006 and 2010 respectively, and Nilesh Dalvi was a runner up in 2008.

Engineering HighPerformance Database Engines

Thomas Neumann

Abstract
Developing a database engine is both challenging and rewarding. Database engines are very complex software artifacts that have to scale to large data sizes and large hardware configurations, and developing such systems usually means choosing between different trade-offs at various points of development. This paper gives a survey over two different database engines, the disk-based SPARQL-processing engine RDF-3X, and the relational main-memory engine HyPer. It discusses the design choices that were made during development, and highlights optimization techniques that are important for both systems.

Presenter
Thomas Neumann

Bio
Thomas Neumann conducts research on database systems, focusing on query optimization and query processing. As part of that research he has build two very successful systems, RDF-3X for efficient processing of large RDF data, and the very fast main-memory database system HyPer. Their development induced many innovative techniques, including advanced join ordering techniques and efficient query compilation approaches. He studied business information systems at the University of Mannheim and received a doctorate in informatics from the same university.Before joining TUM as professor, Neumann was a senior researcher at the Max

General Information

Calls for Contributions

Dates and Guidelines

Program

Participant Information

Local Organization