Keynote 1 Slides
Session Chair: Themis Palpanas (Université Paris Cité)
Alphabets, Grammars, Calculators, and the End of Hand-Crafted Systems
Stratos Idreos, Gordon McKay Professor of Computer Science, Harvard John A. Paulson, School of Engineering & Applied Sciences, Harvard University
Abstract:
The AI revolution is transforming every scientific field and business sector, driving an unprecedented demand for data‑centric computation. As new data types, hardware platforms, and workloads appear faster than ever before, the backbone systems that power this revolution must evolve just as quickly. Yet a single system architecture—whether tuned for computing analytics, generative AI, or machine learning—faces a design space larger than 10¹⁰⁰ alternatives, and we still cling to a handful of “good” templates that each require years of manual design and implementation tuning. It is time to abandon this artisanal practice and embrace self-designing systems: systems that can reason about and refactor their own architecture. We show that by modeling the design space of systems as an alphabet of low‑level design primitives and whole architectures as sentences in a grammar over that alphabet, “systems calculators” can now synthesize fresh systems blueprints on demand. The Data Calculator explores trillions of previously unknown data‑structure variants to pick an optimal layout; Cosine and Limousine generate novel NoSQL stores that run up to three orders of magnitude faster than today’s best deployments; the Image Calculator co‑designs entirely new storage formats and neural networks to speed vision pipelines by 10×; and LegoAI and TorchTitan invent novel distributed‑training algorithms for large AI models that extract every flop and byte from modern accelerators. These results signal a future in which systems research increasingly focuses on crafting richer alphabets and grammars while machines write the sentences, freeing designers and researchers to pursue more profound questions and enabling practitioners to dial in cost, latency, and accuracy with surgical precision.
Bio:
Stratos Idreos is the Gordon McKay Professor of Computer Science at Harvard’s John A. Paulson School of Engineering and Applied Sciences and serves as Faculty Co-Director of the Harvard Data Science Initiative. Stratos leads DASlab, the Harvard Data Systems Laboratory. His research pursues a “grammar of data systems,” enabling machines—not humans—to design and tune systems architectures, resulting in systems that are tailored to their context, faster and more scalable. Stratos’s work has been recognized by the community with honors such as the ACM SIGMOD Jim Gray Dissertation and ERCIM Cor Baayen awards (2011), IEEE TCDE Rising Star (2015), NSF CAREER and DOE Early Career awards, the ACM SIGMOD Contributions Award (2020) and Test-of-Time Award (2022), as well as a Sloan Research Fellowship and Harvard’s McDonald Mentoring Award (2023). He has co-chaired ACM SIGMOD 2021 and IEEE ICDE 2022, co-founded the ACM/IMS Journal of Data Science, and currently serves as the chair of the ACM SoCC Steering Committee.
Keynote 2
Session Chair: Nesime Tatbul (Intel Labs and MIT)
Bridging Disciplines in Data Management Research to Solve Complex Data Problems
Juliana Freire, Institute Professor at the Tandon School of Engineering and Professor of Computer Science and Engineering and Data Science, New York University
Abstract:
Scientific discovery has undergone profound transformations across multiple paradigms, each bringing new data challenges whose solutions demand bridging multiple areas of computer science. This talk presents a research journey spanning three scientific paradigms and projects that illustrate how domain-driven problems reveal fundamental data management challenges and drive interdisciplinary innovation. From the need to manage complex pipelines and their provenance in computational science (3rd paradigm), to new requirements that arise in data-driven discovery (4th paradigm) to support visual exploration of large-scale spatio-temporal data, and today's AI-powered discovery paradigm (5th paradigm), where AI enables effective and general approaches to the long-standing data integration problem. The projects share a common pattern: complex scientific challenges demand more than single-discipline solutions, and by embracing collaboration across computer science areas and working closely with domain experts, we can identify fundamental research opportunities that lead to both methodological advances and systems with real-world impact.
Bio:
Juliana Freire is an Institute Professor at the Tandon School of Engineering and Professor of Computer Science and Data Science at New York University, where she co-directs the Visualization Imaging and Data Analysis (VIDA) Center. Her research develops methods and systems that enable a wide range of users to obtain trustworthy insights from data. It spans topics in large-scale data analysis and integration, visualization, machine learning, provenance management, and web information discovery, addressing application areas including urban analytics, predictive modeling, computational reproducibility, and biomedical data harmonization. She has co-authored over 250 papers, including 12 award winners and a test-of-time award. She served as elected chair of ACM SIGMOD and as a council member of the Computing Community Consortium (CCC), and was the NYU lead investigator for the Moore-Sloan Data Science Environment. She is a Fellow of the ACM and AAAS, and a winner of the ACM SIGMOD Contributions Award. Her work has been supported by funding agencies and industry partners including the National Science Foundation, DARPA, ARPA-H, Department of Energy, National Institutes of Health, and technology companies including Google, Amazon, Microsoft Research, and IBM. Freire received her Ph.D. and M.Sc. degrees in computer science from the State University of New York at Stony Brook and her B.S. degree in computer science from the Federal University of Ceara in Brazil.
Session Chair: Peter
Pietzuch (Imperial College London)
Bringing the Operational and Analytical Worlds Together with Lakebase
Matei Zaharia, CTO and co-founder of Databricks, Associate Professor of Computer Science at UC Berkeley
Abstract:
As database workloads increasingly move into large shared-nothing cloud datacenters, the bits storing operational data, analytical tables, streams, etc all sit together on the same disks in the cloud. This creates new opportunities to unify the capabilities of operational and analytical systems, while being mindful of “one size fits all” pitfalls. I’ll discuss how Databricks and Neon are exploring this opportunity with Lakebase, an architecture for OLTP DBMSes that leverages open formats and cloud object stores to also enable efficient analytics on the same data and easy interop between the two worlds. Furthermore, since it wouldn’t be a 2025 keynote without AI, I’ll explain how we are seeing agents change the demand on both types of systems and appear to be resulting in more “analytics-like” workloads on OLTP databases and more “OLTP-like” workloads on analytical ones, primarily by issuing much larger numbers of small exploratory queries. These trends create many exciting new challenges for the research community.
Bio:
Matei is the CTO and co-founder of Databricks and an Associate Professor of Computer Science at UC Berkeley. He started the Apache Spark project during his Ph.D. program at UC Berkeley in 2009 and has worked on other widely used data and AI software, including MLflow, Delta Lake, Unity Catalog and DSPy. His most recent research covers improved cloud infrastructure and new programming models and optimization methods for AI. Matei’s research was recognized through the 2014 ACM Doctoral Dissertation Award and the U.S. Presidential Early Career Award for Scientists and Engineers (PECASE).