Why Are Learned Indexes So Effective but Sometimes Ineffective?

Authors:

Qiyu Liu, Siyuan Han, Yanlin Qi, Jingshu Peng, Jin Li, Longlong Lin, Lei Chen

Download PDF

Abstract

Learned indexes have attracted significant research interest due to their potential to offer better space-time trade-offs compared to B+tree variants. Among various learned indexes, the PGM-Index based on error-bounded piecewise linear approximation is an elegant data structure that has demonstrated provably superior performance over conventional B+-tree indexes. However, despite numerous efforts to optimize the design of the PGM-Index, few systematically study the root causes of performance mismatches observed in practice. In this paper, we explore two key research questions. Q1 : Why are PGM-Indexes theoretically effective? and Q2 : Why do PGM-Indexes underperform in practice? For Q1 , we show that for a set of 𝑁 sorted keys, the PGM-Index can achieve a lookup time of 𝑂 ( log log 𝑁 ) while using 𝑂(𝑁) space. For Q2 , we identify that querying PGM-Indexes is highly memory-bound, where the internal index search operations often become the bottleneck. To fill the performance gap, we propose PGM++, a simple yet effective extension to the original PGM-Index that employs a mixture of different search strategies, with hyper-parameters automatically tuned through a cost model calibrated by theoretical findings. Extensive experiments show that, at comparable space costs, PGM++ speeds up index lookup queries by up to 2.31 × and 1.56 × when compared to the original PGM-Index and SOTA baselines.

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 18, No. 9

Why Are Learned Indexes So Effective but Sometimes Ineffective?

Abstract