Data Movement-Aware GPU Sharing for Data-Intensive Systems

Authors:
Yi Jiang, Hamish Nicholson, Viktor Sanca, Anastasia Ailamaki
Abstract

Modern data platforms require GPU acceleration for both analytical query processing and AI inference. These workloads exhibit contrasting resource bottlenecks: analytics saturates the PCIe interconnect with data transfers, while inference stresses on-device computation. We observe that when colocated, these workloads create opportunities rather than conflicts: one workload’s bottleneck corresponds to idle resources for the other. Yet existing GPU sharing mechanisms fail to exploit this insight, degrading performance. We present GPU Unified Sharing with Transfer-awareness (GUST) scheduling that treats the PCIe interconnect as a first-class, schedulable resource alongside compute and memory. Our approach monitors workload interconnect intensity to classify tasks as transfer-intensive or device-intensive, then intelligently interleaves their execution. By scheduling transfer-intensive analytics kernels to saturate PCIe bandwidth while running device-intensive inference kernels in the gaps, our scheduler achieves high utilization of both the interconnect and GPU compute resources. In experiments that colocate four mixed analytics and inference workloads, our prototype reduces performance degradation compared to dedicated GPU performance from 3.9-7× (geometric mean) for existing sharing mechanisms to 2.8×. By making data movement visible to the scheduler, we transform GPUs from dedicated accelerators into efficiently shared computational resources suitable for the heterogeneous workloads of modern data systems.