go back
go back
Volume 18, No. 9
GpJSON: High-performance JSON Data Processing on GPUs
Abstract
The JavaScript Object Notation (JSON) format is ubiquitous, and countless applications depend on it to store and exchange high volumes of data. Despite its great popularity, JSON is nevertheless a very inefficient data format: decoding and querying JSON data is often a major bottleneck for many data-intensive applications. In this paper, we explore how Graphics Processing Units (GPUs) can be used to parallelize both JSON de-serialization and querying. We show how JSON parsing can be implemented on GPUs by means of parallel structural index construction, and we describe how JSON data can then be queried in situ using a lightweight query engine designed to run on GPUs. We present the design and implementation of GpJSON, a GPU-based JSON data processing library. The library can be used from high-level languages such as JavaScript or Python, and features bindings for the GraalVM language runtime. Our evaluation on real-world datasets shows that, on a single NVIDIA Ampere A100, GpJSON achieves at least 2.9× speedup on end-to-end performance (de-serialization plus querying) over state-of-the-art parallel JSON parsers and query engines, and 6-8× over NVIDIA RAPIDS.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy