The HANA Native Query Engine for Lakehouse Systems

Authors:

Daniel Ritter, Mihnea Andrei, Sukhyeun Cho, Maik Goergens, Taehyung Lee, Norman May, Amit Pathak, Paul Willems

Abstract

Modern enterprise applications and data warehouse systems move data into data lakes for economical and scalability reasons. Data is then stored in popular columnar file formats like Parquet which are optimized for writing using open table formats like Iceberg or Delta. This presents new challenges for existing database systems and their execution engines because excellent performance and scalability when accessing this data in complex analytical queries is expected while data is located in a remote data lake. In this work, we present how we adapted the HANA Cloud Database Engine for efficient processing of files in data lakes, which we call SQL-on-Files (SoF). We motivate this evolution by its relevance for Business Data Cloud, SAP’s Lakehouse, we discuss the viability of general architecture choices like pushdown and direct access architectures, and give insights into our SoF design decisions towards scalable, analytical query processing around execution engine, optimizer and caching. Our evaluation of SoF shows benefits of direct access over pushdown architectures for a new warehouse benchmark with complex, analytical workloads. KEYWORDS Cloud Data Platform, Database System, Data Lake, Lakehouse

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 18, No. 12

The HANA Native Query Engine for Lakehouse Systems

Abstract