go back

Volume 18, No. 11

GraphCSR: A Degree-Equalized CSR Format for Large-scale Graph Processing

Authors:
Xinbiao Gan, Tiejun Li, Chunye Gong, Dongsheng Li, Dezun Dong, Jie Liu, Kai Lu

Abstract

Graph processing underpins a vast array of data-centric applications, serving as a crucial component in fields such as social network analysis, recommendation systems, bio-informatics, and search engines. As graph data grows in scale and complexity, highperformance graph processing is increasingly essential. Many graph processing tasks depend on efficient data structures to manage the sparsity typical of real-world graphs, where most vertices have limited connectivity. This sparsity poses challenges for memory and computational efficiency in large-scale graph processing, and conventional sparse formats like Compressed Sparse Row (CSR) often struggle with memory and computation inefficiencies when handling massive graphs. To address these challenges, we introduce GraphCSR, a degree-equalized CSR format specifically tailored to enhance the spatio-temporal efficiency of distributed graph processing across various tasks. GraphCSR aggregates low-degree vertices into synthetic high-degree ones and applies group-wise compression to reduce storage overhead by recording only the starting index for each aggregated group. This reduces memory usage and supports batch-memory access to improve performance. Our extensive evaluations in various graph processing algorithms and datasets demonstrate that GraphCSR not only reduces the memory footprint required for large-scale graphs, but also improves performance across multiple types of graph processing tasks, outperforming popular sparse storage formats. Furthermore, when deployed on a production-scale supercomputer with 79,024 nodes, GraphCSR achieved a graph processing throughput that exceeded the top-ranked system on the Graph500 benchmark.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy