go back

Volume 18, No. 8

PipeTGL: (Near) Zero Bubble Memory-based Temporal Graph Neural Network Training via Pipeline Optimization

Authors:
Jun Liu, Bingqian Du, Ziyue Luo, Sitian Lu, Qiankun Zhang, Hai Jin

Abstract

Memory-based Temporal Graph Neural Networks (M-TGNNs) demonstrate superior performance in dynamic graph learning tasks. Their success attributes to a memory module, which captures historical information for each node and implicitly creates a memory dependency constraint among chronologically ordered minibatches. This unique characteristic of M-TGNN introduces new challenges for parallel training that have not been encountered before. Existing parallelism strategies for M-TGNN either sacrifice memory accuracy (minibatch parallelism and epoch parallelism) or compromise space efficiency (memory parallelism) to optimize runtime. This paper proposes a pipeline parallel approach for multi-GPU M-TGNN training that effectively addresses both inter-minibatch memory dependencies and intra-minibatch task dependencies, based on a runtime analysis DAG for M-TGNNs. We further optimize pipeline efficiency by incorporating improved scheduling, finer-grained operation reorganization, and targeted communication optimizations tailored to the specific training properties of M-TGNN. These enhancements significantly reduce GPU waiting and idle time caused by memory dependencies and frequent communication and result in zero pipeline bubbles for common training configurations. Extensive evaluations demonstrate that PipeTGL achieves a speedup of 1.27x to 4.74x over other baselines while also improving the accuracy of M-TGNN training across multiple GPUs.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy