go back

Volume 18, No. 12

Streaming View: An Efficient Data Processing Engine for Modern Real-time Data Warehouse of Alibaba Cloud

Authors:
Fangyuan Zhang, Mengqi Wu, Chunlei Xu, Yunong Bao, Jiyu Qiao, Yingli Zhou, Hua Fan, Caihua Yin, Wenchao Zhou, Feifei Li

Abstract

Real-time data warehouses are essential for modern applications. Extract-Transform-Load (ETL) as a fundamental component of offline data warehouses also provides crucial support within realtime data warehouses. Among various traditional ETL approaches, Lambda and Kappa have emerged as classic real-time data processing solutions due to their freshness and query performance, which best meet business demands. However, both of them often require the integration of external stream processing engines, introducing challenges related to complexity, efficiency, and consistency. ZeroETL has emerged as an approach to address these issues. Nevertheless, existing ZeroETL-based solutions primarily emphasize the implementation of extraction and loading, resulting in limitations in handling transformation. Incremental View Maintenance (IVM) offers an alternative that can enhance ZeroETL. However, existing IVM implementations often focus on query acceleration rather than supporting high-throughput, complex real-time workloads. To address these challenges, we propose Streaming View, an efficient real-time data processing engine integrated within AnalyticDB of Alibaba Cloud. Unlike existing solutions, Streaming View supports high-throughput, complex data processing for realtime streaming ETL workloads. Furthermore, it can be leveraged to optimize ZeroETL-based approaches by enhancing transformation capabilities. We design tailored algorithms and optimizations for diverse syntaxes and high-throughput scenarios, ensuring the system meets complex application needs. By integrating incremental computation into the data warehouse, Streaming View reduces complexity, ensures data consistency, and boosts performance, offering a robust solution for real-world applications. Experiments show Streaming View improves processing performance by up to 7x and 20x over traditional ETL and IVM methods, respectively, and addresses complex scenarios unsolved by existing solutions.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy