go back
go back
Volume 18, No. 5
BACH: Bridging Adjacency List and CSR Format using LSM-Trees for HGTAP Workloads
Abstract
Modern data-intensive applications require databases that support fast analytical processing on massive dynamic graphs in real time, while simultaneously providing transactional guarantees for modi- fying graph-based objects (i.e., edges, vertices and their properties). Achieving efficient Hybrid Graph Transactional/Analytical Pro- cessing (HGTAP) in a database poses significant challenges due to the simultaneous requirements of high operation throughput, high data freshness, and high performance isolation when processing concurrent read/write queries on intricate graph topology. Existing disk-based graph databases fail to meet these requirements at the same time due to their inclined data layout, such as the transac- tional storage based on adjacency list and the analytical storage based on CSR (compressed sparse row) format. To address these challenges, we present BACH (Bridging Adja- cency List and CSR Format using LSM (Log-Structured Merge)- Trees for HGTAP Workloads) to fill the gaps in HGTAP databases. BACH expands the design space of traditional LSM-Trees to ac- commodate different graph data layouts in different levels. The compaction process is further extended to seamlessly transform the graph layout from the TP-friendly adjacency list to the AP- friendly CSR format through the data propagation to deeper levels in the LSM-Tree. A novel compaction policy, namely elastic merge, is carefully devised to adapt to diverse workloads and the skew vertex degree distribution on graph data. These techniques lead to a Graph-aware Real-time (GR)-LSM-Tree, which can provide consis- tently efficient data access for diverse workloads throughout the entire lifespan of graph objects. Then, a lightweight multi-version scheme is devised for the GR-LSM-Tree to accelerate the concur- rent read/write processing with the snap-shot isolation guarantee. Comprehensive experiments demonstrate that BACH significantly outperforms other disk-based graph database solutions in HGTAP workloads.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy