Proceedings of CIDR

Session 1: LLMs for Databases

KathDB: Explainable Multimodal Database Management System with Human-AI Collaboration

Guorui Xiao, Enhao Zhang, Nicole Sullivan, Will Hansen, Magdalena Balazinska

Waiting to Decompress: The Economics of LLM-Based Compression

Andreas Kipf, Tobias Schmidt, Ping-Lin Kuo, Skander Krid, Moritz Rengert, Luca Heller, Andreas Zimmerer, Mihail Stoian, Varun Pandey, Alexander van Renen

BridgeScope: A Universal Toolkit for Bridging Large Language Models and Databases

Lianggui Weng, Dandan Liu, Rong Zhu, Bolin Ding, Jingren Zhou

Making Prompts First-Class Citizens for Adaptive LLM Pipelines

Uğur Çetintemel, Shu Chen, Alexander W. Lee, Deepti Raghavan, Duo Lu, Andrew Crotty

Deep Research is the New Analytics System: Towards Building the Runtime for AI-Driven Analytics

Matthew Russo, Tim Kraska

Session 2: Data Platform Benchmarking and Optimization Techniques

End-to-End Declarative Data Analytics: Co-designing Engines, Interfaces, and Cloud Infrastructure

Pinghe Li, Tom Kuchler, Marko Kabić, Tobias Stocker, Gustavo Alonso, Ana Klimovic

Survivorship Bias in Industrial Database Workloads

Ryan Marcus, Jeffrey Tao, Peizhi Wu, Zijie Zhao

A Multi-tenant Relational OLTP Database at Salesforce

Vaibhav Arora, Subho Chatterjee, Terry Chong, Thomas Fanghaenel, Pat Helland, Jamie Martin, Kaushal Mittal, Nat Wyatt

I Can't Believe It's Not Yannakakis: Pragmatic Bitmap Filters in Microsoft SQL Server

Hangdong Zhao, Yuanyuan Tian, Rana Alotaibi, Bailu Ding, Nicolas Bruno, Jesús Camacho-Rodríguez, Vassilis Papadimos, Ernesto Cervantes Juárez, Cesar Galindo-Legaria, Carlo Curino

Fast Vector Search in PostgreSQL: A Decoupled Approach

Jiayi Liu, Yunan Zhang, Chenzhe Jin, Aditya Gupta, Shige Liu, Jianguo Wang

Session 3: Text-to-SQL, Agents, LLMs, Oh My!

Text-to-SQL Benchmarks are Broken: An In-Depth Analysis of Annotation Errors

Tengjun Jin, Yoojin Choi, Yuxuan Zhu, Daniel Kang

Leveraging Query Optimizers to Verify the Soundness of LLM-based Query Rewrites for Real-World Workloads, and More

Vivek Narasayya, Surajit Chaudhuri

BenchPress: A Human-in-the-Loop Annotation System for Rapid Text-to-SQL Benchmark Curation

Fabian Wenz, Omar Bouattour, Devin Yang, Justin Choi, Cecil Gregg, Nesime Tatbul, Çağatay Demiralp

Please Don't Kill My Vibe: Empowering Agents with Data Flow Control

Charlie Summers, Haneen Mohammed, Eugene Wu

Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First

Shu Liu, Soujanya Ponnapalli, Shreya Shankar, Sepanta Zeighami, Alan Zhu, Shubham Agarwal, Ruiqi Chen, Samion Suwito, Shuo Yuan, Ion Stoica, Matei Zaharia, Alvin Cheung, Natacha Crooks, Joseph E. Gonzalez, Aditya G. Parameswaran

Session 4: Distributed Coordination and Consistency

Consistency and Correctness in Data-Oriented Workflow Systems

Michael Stonebraker, Xinjing Zhou, Peter Kraft, Qian Li

Event Horizon: Asymmetric Dependencies for Fast Geo-Distributed Operations

Jonathan Arns, Harald Ng, Kyriakos Psarakis, Asterios Katsifodimos, Paris Carbone

Privacy Meets Regulations: Shaping the Future of Work

Mohammad Javad Amiri, Tristan Allard, Boon Thau Loo, Divyakant Agrawal, Amr El Abbadi

Rosé: Flexible Replication With Strong Semantics For Partitioned Databases

Ioannis Zarkadas, Kelly Kostopoulou, Thomas Graham, Junfeng Yang, Philip A. Bernstein, Asaf Cidon, Tamer Eldeeb

Session 5: SQL and Data Modeling

On the Vexing Difficulty of Evaluating IN Predicates

Altan Birler, Thomas Neumann

Raqlet: Cross-Paradigm Compilation for Recursive Queries

Amir Shaikhha, Youning Xia, Meisam Tarabkhah, Jazal Saleem, Anna Herlihy

Semantic Data Modeling, Graph Query, and SQL, Together at Last?

Jeff Shute, Colin Zheng, Romit Kudtarkar

Database Research needs an Abstract Relational Query Language

Wolfgang Gatterbauer, Diandre Miguel B. Sabale

Session 6: Data Integration and Wrangling

Towards Scalable Visual Data Wrangling via Direct Manipulation

El Kindi Rezig, Mir Mahathir Mohammad, Nicolas Baret, Ricardo Mayerhofer, Andrew McNutt, Paul Rosen

The Pneuma Project: Reifying Information Needs as Relational Schemas to Automate Discovery, Guide Preparation, and Align Data with Intent

Muhammad Imam Luthfi Balaka, Raul Castro Fernandez

A Vision for Autonomous Data Agent Collaboration: From Query-by-Integration to Query-by-Collaboration

Timo Eckmann, Carsten Binnig

Session 7: Memory, I/O, and Data Movement in Modern Data Systems

Flexible I/O for Database Management Systems with xNVMe

Emil Houlborg, Andreas Nicolaj Tietgen, Simon A. F. Lund, Marcel Weisgut, Tilmann Rabl, Javier González, Vivek Shah, Pınar Tözün

Declarative Memory Services

Jeronimo Castrillon, Jana Giceva, Yu Hua, Kimberly Keeton, Akhil Shekar, Kevin Skadron, Tianzheng Wang, Huanchen Zhang

Data Movement-Aware GPU Sharing for Data-Intensive Systems

Yi Jiang, Hamish Nicholson, Viktor Sanca, Anastasia Ailamaki

Cloudspecs: Cloud Hardware Evolution Through the Looking Glass

Till Steinert, Maximilian Kuschewski, Viktor Leis

Session 8: Hardware-Accelerated Query Processing

Rethinking Analytical Processing in the GPU Era

Bobbi Yogatama, Yifei Yang, Kevin Kristensen, Devesh Sarda, Abigale Kim, Adrian Cockcroft, Yu Teng, Joshua Patterson, Gregory Kimball, Wes McKinney, Weiwei Gong, Xiangyao Yu

Raster is Faster: Rethinking Ray Tracing in Database Indexing

Harish Doraiswamy, Jayant R. Haritsa

Does A Fish Need a Bicycle? The Case for On-Chip NPUs in DBMS

Alexander Baumstark, Kai-Uwe Sattler

Hash Joins Meet CXL: A Fresh Look

Wentao Huang, Mian Lu, Kian-Lee Tan