VLDB 2021: Round Table Sessions

Date: 17Aug

Time 22:00 - 23:00 CEST Hybrid Transaction and Analytical Processing (HTAP) Chaired by Xiaoyu Ma (PingCAP)

Panelists: Yu Yang (Greenplum), Tathagata Das (Databricks), Jiacheng Yang (Google), Dongxu Huang (PingCAP) Abstract: With the rising need of real-time analytics, HTAP is attracting more and more attention in both academia and industry. In this session we discuss the different designs of an HTAP architecture and the future forms of HTAP systems. We invited 4 industry practitioners who focused on this area.

Time 22:00 - 23:00 CEST Blockchain Database Chaired by Hank Korth (Lehigh University) and Jianliang Xu (HKBU)

Panelists: Mark Rakmilevich (Oracle Senior Director, Blockchain Product Management), Mo Sadoghi (UC Davis), Panagiotis Antonopoulos (Azure SQL Database Group, Microsoft), Yuzhe Tang (Syracuse University) Abstract: At its core, a blockchain is a store of information and thus a database. In practice, blockchain deployments range over a continuum from fully public, decentralized systems (such as Ethereum and Bitcoin), to components of a private enterprise information system (often, though not exclusively Hyperledger Fabric). There are good reasons for blockchains at various points on this public-private continuum and significant opportunities for our research and development communities both for new research and application of our prior research and systems. The panelists represent this full range of opportunities across these domains.

Time 22:00 - 23:00 CEST Are we D&I? Chaired by Sihem Amer-Yahia (Université Grenoble Alpes)

Panelists: Leopoldo Bertossi (UAI, Santiago, Chile), Angela Bonifati (University of Lyon, France), H V Jagadish (University of Michigan, USA), Victor Zakhary (Oracle, USA) Abstract: This panel will explore our ability to conduct research that improves diversity and inclusion. Short bios: Sihem Amer-Yahia is Research Director at CNRS. She head of the Scalable Data Exploration and Data Ethics Group and Deputy Director of the Laboratoire d’Informatique de Grenoble in France.
Leopoldo Bertossi is a Full Professor at the Faculty of Engineering and Sciences, "Universidad Adolfo Ibáñez" (UAI, Santiago, Chile), where he is the Director of the Graduate Programs in Data Science.
Angela Bonifati is a Professor of Computer Science at Lyon 1 University and the head of the Database group at the CNRS Liris research lab. Since 2020, she is also an Adjunct Professor in the Data Systems group at the University of Waterloo in Canada.
H. V. Jagadish is Bernard A Galler Collegiate Professor of Electrical Engineering and Computer Science at the University of Michigan in Ann Arbor, and Director of the Michigan Institute for Data Science. Prior to 1999, he was Head of the Database Research Department at AT&T Labs, Florham Park, NJ.
Victor Zakhary is Senior Member Of Technical Staff at Oracle. He work on blockchains, distributed systems, data privacy and security, and privacy of social network users.

Time 22:00 - 23:00 CEST Interactive Querying and Visualization for Large Data Chaired by Tiziana Catarci (University of Rome), Dominik Moritz (CMU, Apple) and Anna Fariha (Microsoft)

Panelists: Eugene Wu (Columbia), Sarah Catanzaro (Amplify Partners), Sajjadur Rahman (Megagon), Tarique Siddique (Microsoft Research), Leilani Battle (University of Washington) Abstract: Interactive querying and visualization is playing a pivotal role in data analysis processes, where the first task of a data analyst is to get a comprehension of the data at hand and subsequently formulate hypotheses and validate the hypotheses by testing them over the data to gain data insights. The interactions usually take the form of direct manipulation of the visualization and/or rely on a set of classic interaction widgets (e.g. slider, buttons, etc.). Such interactive visual systems are typically coupled with data management systems. Interactive visual systems are proving to be increasingly valuable also in preventing confusion, understanding errors, and explaining the behavior of machine learning algorithms since users can explore and manipulate until they are satisfied with their understanding. This panel will concentrate on recent advances and remaining limits of interactive data visualization, including the crucial factors such as the requirement of real-time response to the user's interactions.

Time 22:00 - 23:00 CEST Magical Machine Learning for Database Tuning Chaired by Andy Pavlo (CMU)

Panelists: Weiwei Gong (Oracle), Sam Lightstone (Facebook), Surajit Chaudhuri (Microsoft), Dana Van Aken (Ottertune), Bailu Ding (Microsoft) Abstract: Ever since the arrival of the first database management systems (DBMSs) in the 1970s, people have dreamed of having a system that could handle all aspects of configuring, tuning, and optimizing itself. Recently, researchers have applied modern machine learning (ML) methods for automated physical database design, knob tuning, query optimization, and resource provisioning. The allure of ML is that it can potentially uncover patterns and handle complex problems beyond the abilities of humans. ML can also use information collected from tuning previous databases and apply that knowledge to new databases in the future. This VLDB 2021 panel will discuss the challenges, successes, and failures of using ML for automated database tuning.

Time 22:00 - 23:00 CEST Graph Data Management and Analytics Chaired by Laks V. S. Lakshmanan (The University of British Columbia) and Wenjie Zhang (University of New South Wales)

Panelists: Semih Salihoglu (University of Waterloo), Reynold Cheng (University of Hong Kong), Mingxi Wu (TigerGraph), Srinivasan Parthasarathy (Ohio State University), Bogdan Arsintescu (Linkedin) Abstract: Graph data management and analytics provide powerful insights into how to unlock the connections between data elements and the value that they hold. Due to this power, techniques for managing and analyzing graphs are becoming an increasingly popular topic in both academia and industry. In this roundtable session, we invite a group of experts from industry and academia to discuss trending topics in the field, including (1) critical components/functionalities of graph data management systems and how these benefit industry applications, (2) current trends in graph embedding and their applications to graph analytics, (3) modern hardware in enhancing the efficiency of graph analytics, and (4) ML for graph data management and analytics.

Time 22:00 - 23:00 CEST Systems for ML Chaired by Arun Kumar (UCSD) and Chris Jermaine (RICE University)

Panelists: Ce Zhang (ETH Zurich), Sebastian Schelter (University of Amsterdam), Paroma Varma (Snorkel), Matei Zaharia (Stanford and Databricks), Matthias Boehm (TU Graz), Carlo Curino (Microsoft), Jun Yang (Duke University), Manasi Vartak (Verta) Abstract: All things data systemsy for all things ML/AI.

Date: 18Aug

Time 08:00 - 08:50 CEST Learning based Algorithms Chaired by Junhao Gan (University of Melbourne) and Wei Wang (HKUST)

Panelists: Renata Borovica-Gajic (University of Melbourne), Ju Fan (Renmin University), Bin Yang (Aalborg University), Yalinag Li (Alibaba Group) Abstract: In this sesstion, the panel members will share their views and thoughts in the research direction of learning based algorithms. Especially, they will discuss on the challenges and opportunities.

Time 08:00 - 08:50 CEST Graph Embedding and Mining Chaired by Davide Mottin (Aarhus Univerist) and Sibo Wang (CUHK)

Panelists: Jian Pei (Simon Fraser University), Steve Skiena (Stony Brook University), Zhewei Wei (Renmin University), Stephan Günnemann (TU Munich), Peng Cui (Tsinghua University), Jure Leskovec (Stanford University), Lingfei Wu (JD.COM) Abstract: The graph mining panorama has been lately dominated by methods that find intermediary representations of nodes known as graph embeddings and graph neural networks. Are these technology the panacea of graph mining or there is still a long way to solve more traditional graph mining problems? In this roundtable we will discuss limitations and challenges of these approaches especially considering robustness, guarantees, and scalability issues.

Time 08:00 - 08:50 CEST Spatio-temporal Data System Chaired by Mohamed Sarwat (Arizona State University) and Hua Lu (Roskilde University)

Panelists: Ahmed Eldawy (URC), Christian S. Jensen (AAU), Feifei Li (Alibaba), Eleni Tzirita Zacharatou (TUB) Abstract: Recent years have seen significant increase in collections of spatio-temporal data that spans a multitude of domains such as earth observation, transportation, smart cities and autonomous vehicles. Big spatial data from those domains have posed new challenges to the research and implementation of spatio-temporal data systems. In this session, several colleagues will share their views and insights on how to address those challenges from both research and industry perspectives.

Time 08:00 - 08:50 CEST Data Management on Modern Hardware Chaired by Bingsheng He (NUS) and Yuchen Li (SMU)

Panelists: Steffen Zeuch (TU Berlin), Tianzheng Wang (SFU), Yingjun Wu (Singularity Data), Shengliang Lu (NUS) Abstract: The emergence of modern hardware has spiked the interests on building data management systems with the new hardware technologies. These include new computing units, novel storage and networking devices. In this roundtable session, we invite a group of young scholars from industry and academia to discuss the opportunities as well as the challenges of adopting modern hardware for large scale data management.

Date: 19Aug

Time 22:00 - 23:00 CEST Scalable Data Curation (Integration and Cleaning) Chaired by Dong Deng (Rutgers)

Panelists: Lei Cao (MIT CSAIL), Fatemeh Nargesian (University of Rochester), Raul Castro Fernandez (University of Chicago), Jiannan Wang (Simon Fraser University) Abstract: In this roundtable discussion, we invited four experts in data preparation, data marketing, data integration, and data cleaning to discuss the cutting-edge research in scalable data curation.
In particular, we will discuss (1) what are your current and long-time research goal? (2) what do you think of the recent data-centric campaign and how can we contribute to it? (3) how can we make real-world impact based on our research?

Time 22:00 - 23:00 CEST Women in Databases: What does it mean? Chaired by Renata Borovica-Gajic (University of Melbourne) and Fatma Ozcan (Google)

Panelists: Susan Davidson (University of Pennsylvania), Julia Stoyanovich (New York University), Yuanyuan Tian (Microsoft), Danica Porobic (Oracle) Abstract: Join us for this coffee style session where we will discuss anything from the impact of COVID on our lives, to how to choose a promising research topic or establish a fruitful collaboration.

Time 22:00 - 23:00 CEST Data Governance and Provenance Chaired by Sudeepa Roy (Duke University) and Boris Glavic (Illinois Institute of Technology)

Panelists: Pierre Sennelart (INRIA), Eugene Wu (Columbia University), Lukas Rupprecht (IBM), Melanie Herschel (University of Stuttgart) Abstract: Data Governance describes the capability of an organization to manage their data to ensure high data quality through-out the complete data life cycle. As such data governance necessitates data integration, cleaning, data security (e.g., access control), accountability for data processing, collection, and quality.
By providing a record of the operations that lead to the creation of a piece of data and by connecting data the other data it is derived from, data provenance provides a fundamental fabric for data governance. This panel will discuss the inter-connection between data governance and data provenance.

Time 22:00 - 23:00 CEST Reproducibility and/or Availability Chaired by Peter Triantafillou (University of Warwick) and Manos Athanassoulis (Boston University)

Panelists: Badrish Chandramouli (Microsoft Research), Ioana Manolescu (INRIA & Ecole Polytechnique), Nesime Tatbul (Intel Labs & MIT), Raja Appuswamy (Eurecom), Rajesh Bordawekar (IBM), Xuntao Cheng (Alibaba Cloud) Yannis Papakonstantinou (Amazon AWS) Abstract: In this session, the panelists will share their views on the state of reproducibility and artifacts availability in our community. We will discuss the current status, what we as a community are doing well, and what should be done in a new and better way. We expect to have a diversity of opinions and come out with several interesting ideas about how our community should move forward with assessing the reproducibility of data management research.

Time 22:00 - 23:00 CEST Responsible AI Systems and Experiences Chaired by Abolfazl Asudeh (University of Illinois at Chicago) and Bill Howe (University of Washington)

Panelists: Hal Daum III (University of Maryland, Microsoft Research), Katie Shilton (University of Maryland), Golnoosh Farnadi (University of Montréal), Solon Barocas (Microsoft Research, Cornell University), Bernease Herman (University of Washington), Jenn Wortman Vaughan (Microsoft Research), Yuval Moskovitch (University of Michigan) Abstract: What role can data management systems play in facilitating responsible AI systems and experiences?
What are the major open research questions in responsible AI?
Does responsible AI entail fundamentally new capabilities, or a refinement of our professional practice and how we engage?

Time 22:00 - 23:00 CEST Data Preparation for ML Chaired by Ziawasch Abedjan and Xu Chu

Panelists: Yeye He (Microsoft), Paolo Papotti (EURECOM), Renee Miller (Northeastern Univeristy), Juliana Freire (NYU) Abstract: This round table starts from a very short intro to the topic (5 min) by the chairs. Then each panelist has the chance to give their take on data prep for ML and maybe showcase their own research in that field briefly. (5 minutes each = 20 minutes total)
For the remaining time, we will have an open roundtable regarding (part of) the following questions:
1. What is the difference between traditional data preparation and data preparation for ML?
2. What are the most important and challenging problems in data preparation for ML?
3. What is a good success metric what would be a good benchmark to assess data preprations for ML?
4. What are your thoughts on feature stores and AutoML? Where in the data prep for Ml pipeline do you see them?
5. All the discussion so far implciitely assumed that data prep is for improcing ML model performance. We know that new considerations, such as fairness and privacy play an important role. Is the database community well positioned to tackle these challenges? And what would be other angles that you are aware of that does not receive enough consideration from our community?
6. To which extent is data prepration for ML an DB problem vs. an ML problem? What are the areas where both research directions could converge?