VLDB 2021: Round Table Sessions
Panelists: (Greenplum), (Databricks), (Google), (PingCAP) Abstract: With the rising need of real-time analytics, HTAP is attracting more and more attention in both academia and industry. In this session we discuss the different designs of an HTAP architecture and the future forms of HTAP systems. We invited 4 industry practitioners who focused on this area.
Panelists: (Oracle Senior Director, Blockchain Product Management), (UC Davis), (Azure SQL Database Group, Microsoft), (Syracuse University) Abstract: At its core, a blockchain is a store of information and thus a database. In practice, blockchain deployments range over a continuum from fully public, decentralized systems (such as Ethereum and Bitcoin), to components of a private enterprise information system (often, though not exclusively Hyperledger Fabric). There are good reasons for blockchains at various points on this public-private continuum and significant opportunities for our research and development communities both for new research and application of our prior research and systems. The panelists represent this full range of opportunities across these domains.
(UAI, Santiago, Chile),
(University of Lyon, France),
(University of Michigan, USA),
This panel will explore our ability to conduct research that improves diversity and inclusion.
Sihem Amer-Yahia is Research Director at CNRS. She head of the Scalable Data Exploration and Data
Ethics Group and Deputy Director of the Laboratoire d’Informatique de Grenoble in France.
Leopoldo Bertossi is a Full Professor at the Faculty of Engineering and Sciences, "Universidad Adolfo Ibáñez" (UAI, Santiago, Chile), where he is the Director of the Graduate Programs in Data Science.
Angela Bonifati is a Professor of Computer Science at Lyon 1 University and the head of the Database group at the CNRS Liris research lab. Since 2020, she is also an Adjunct Professor in the Data Systems group at the University of Waterloo in Canada.
H. V. Jagadish is Bernard A Galler Collegiate Professor of Electrical Engineering and Computer Science at the University of Michigan in Ann Arbor, and Director of the Michigan Institute for Data Science. Prior to 1999, he was Head of the Database Research Department at AT&T Labs, Florham Park, NJ.
Victor Zakhary is Senior Member Of Technical Staff at Oracle. He work on blockchains, distributed systems, data privacy and security, and privacy of social network users.
Panelists: (Columbia), (Amplify Partners), (Megagon), (Microsoft Research), (University of Washington) Abstract: Interactive querying and visualization is playing a pivotal role in data analysis processes, where the first task of a data analyst is to get a comprehension of the data at hand and subsequently formulate hypotheses and validate the hypotheses by testing them over the data to gain data insights. The interactions usually take the form of direct manipulation of the visualization and/or rely on a set of classic interaction widgets (e.g. slider, buttons, etc.). Such interactive visual systems are typically coupled with data management systems. Interactive visual systems are proving to be increasingly valuable also in preventing confusion, understanding errors, and explaining the behavior of machine learning algorithms since users can explore and manipulate until they are satisfied with their understanding. This panel will concentrate on recent advances and remaining limits of interactive data visualization, including the crucial factors such as the requirement of real-time response to the user's interactions.
Panelists: (Oracle), (Facebook), (Microsoft), (Ottertune), (Microsoft) Abstract: Ever since the arrival of the first database management systems (DBMSs) in the 1970s, people have dreamed of having a system that could handle all aspects of configuring, tuning, and optimizing itself. Recently, researchers have applied modern machine learning (ML) methods for automated physical database design, knob tuning, query optimization, and resource provisioning. The allure of ML is that it can potentially uncover patterns and handle complex problems beyond the abilities of humans. ML can also use information collected from tuning previous databases and apply that knowledge to new databases in the future. This VLDB 2021 panel will discuss the challenges, successes, and failures of using ML for automated database tuning.
Panelists: (University of Waterloo), (University of Hong Kong), (TigerGraph), (Ohio State University), (Linkedin) Abstract: Graph data management and analytics provide powerful insights into how to unlock the connections between data elements and the value that they hold. Due to this power, techniques for managing and analyzing graphs are becoming an increasingly popular topic in both academia and industry. In this roundtable session, we invite a group of experts from industry and academia to discuss trending topics in the field, including (1) critical components/functionalities of graph data management systems and how these benefit industry applications, (2) current trends in graph embedding and their applications to graph analytics, (3) modern hardware in enhancing the efficiency of graph analytics, and (4) ML for graph data management and analytics.
Panelists: (University of Melbourne), (Renmin University), (Aalborg University), (Alibaba Group) Abstract: In this sesstion, the panel members will share their views and thoughts in the research direction of learning based algorithms. Especially, they will discuss on the challenges and opportunities.
Panelists: (Simon Fraser University), (Stony Brook University), (Renmin University), (TU Munich), (Tsinghua University), (Stanford University), (JD.COM) Abstract: The graph mining panorama has been lately dominated by methods that find intermediary representations of nodes known as graph embeddings and graph neural networks. Are these technology the panacea of graph mining or there is still a long way to solve more traditional graph mining problems? In this roundtable we will discuss limitations and challenges of these approaches especially considering robustness, guarantees, and scalability issues.
Panelists: (URC), (AAU), (Alibaba), (TUB) Abstract: Recent years have seen significant increase in collections of spatio-temporal data that spans a multitude of domains such as earth observation, transportation, smart cities and autonomous vehicles. Big spatial data from those domains have posed new challenges to the research and implementation of spatio-temporal data systems. In this session, several colleagues will share their views and insights on how to address those challenges from both research and industry perspectives.
Panelists: (TU Berlin), (SFU), (Singularity Data), (NUS) Abstract: The emergence of modern hardware has spiked the interests on building data management systems with the new hardware technologies. These include new computing units, novel storage and networking devices. In this roundtable session, we invite a group of young scholars from industry and academia to discuss the opportunities as well as the challenges of adopting modern hardware for large scale data management.
(University of Rochester),
(University of Chicago),
(Simon Fraser University)
In this roundtable discussion, we invited four experts in data preparation, data marketing, data
integration, and data cleaning to discuss the cutting-edge research in scalable data curation.
In particular, we will discuss (1) what are your current and long-time research goal? (2) what do you think of the recent data-centric campaign and how can we contribute to it? (3) how can we make real-world impact based on our research?
Panelists: (University of Pennsylvania), (New York University), (Microsoft), (Oracle) Abstract: Join us for this coffee style session where we will discuss anything from the impact of COVID on our lives, to how to choose a promising research topic or establish a fruitful collaboration.
(University of Stuttgart)
Data Governance describes the capability of an organization to manage their data to ensure high
data quality through-out the complete data life cycle. As such data governance necessitates data
integration, cleaning, data security (e.g., access control), accountability for data processing,
collection, and quality.
By providing a record of the operations that lead to the creation of a piece of data and by connecting data the other data it is derived from, data provenance provides a fundamental fabric for data governance. This panel will discuss the inter-connection between data governance and data provenance.
Panelists: (Microsoft Research), (INRIA & Ecole Polytechnique), (Intel Labs & MIT), (Eurecom), (IBM), (Alibaba Cloud) (Amazon AWS) Abstract: In this session, the panelists will share their views on the state of reproducibility and artifacts availability in our community. We will discuss the current status, what we as a community are doing well, and what should be done in a new and better way. We expect to have a diversity of opinions and come out with several interesting ideas about how our community should move forward with assessing the reproducibility of data management research.
(University of Maryland, Microsoft Research),
(University of Maryland),
(University of Montréal),
(Microsoft Research, Cornell University),
(University of Washington),
(University of Michigan)
What role can data management systems play in facilitating responsible AI systems and experiences?
What are the major open research questions in responsible AI?
Does responsible AI entail fundamentally new capabilities, or a refinement of our professional practice and how we engage?
This round table starts from a very short intro to the topic (5 min) by the chairs.
Then each panelist has the chance to give their take on data prep for ML and maybe showcase their
own research in that field briefly. (5 minutes each = 20 minutes total)
For the remaining time, we will have an open roundtable regarding (part of) the following questions:
1. What is the difference between traditional data preparation and data preparation for ML?
2. What are the most important and challenging problems in data preparation for ML?
3. What is a good success metric what would be a good benchmark to assess data preprations for ML?
4. What are your thoughts on feature stores and AutoML? Where in the data prep for Ml pipeline do you see them?
5. All the discussion so far implciitely assumed that data prep is for improcing ML model performance. We know that new considerations, such as fairness and privacy play an important role. Is the database community well positioned to tackle these challenges? And what would be other angles that you are aware of that does not receive enough consideration from our community?
6. To which extent is data prepration for ML an DB problem vs. an ML problem? What are the areas where both research directions could converge?