Flexible I/O for Database Management Systems with xNVMe

Authors:
Emil Houlborg, Andreas Nicolaj Tietgen, Simon A. F. Lund, Marcel Weisgut, Tilmann Rabl, Javier González, Vivek Shah, Pınar Tözün
Abstract

Today, NVMe SSDs cover a diverse family of devices (e.g., Zoned Namespaces, Flexible Data Placement, and Key-Value SSDs) and offer high performance (μsec-scale latency). To leverage the capabilities of these devices, a variety of I/O paths are available (e.g., libaio, io_uring, and SPDK). On the other hand, to avoid the challenges and unpredictability that comes with writing code to target such diversity, most data systems today still rely on the conventional filesystem APIs (POSIX) and synchronous IO. While (maybe) increasing programmer productivity, this choice leads to sub-optimal utilization of the modern NVMe storage. To unify the diverse I/O storage paths and make them more accessible to a wider-scale of programmers, Samsung built xNVMe that exposes a single message-passing API with minimal overhead. This paper takes the next step and integrates xNVMe into a state-of-the-art database system, DuckDB, by creating a new filesystem extension, nvmefs, that interacts with blocks on disk instead of files. We demonstrate that xNVMe integration allows DuckDB to utilize IO Passthru, SPDK, and Flexible Data Placement. Using these modern I/O methods, compared to DuckDB’s default sync I/O, nvmefs achieves either comparable performance for non-I/O-intensive cases or up to 50% lower query times on I/O-intensive queries.