go back

Volume 18, No. 9

Why Are Learned Indexes So Effective but Sometimes Ineffective?

Authors:
Qiyu Liu, Siyuan Han, Yanlin Qi, Jingshu Peng, Jin Li, Longlong Lin, Lei Chen

Abstract

Learned indexes have attracted significant research interest due to their potential to offer better space-time trade-offs compared to B+tree variants. Among various learned indexes, the PGM-Index based on error-bounded piecewise linear approximation is an elegant data structure that has demonstrated provably superior performance over conventional B+-tree indexes. However, despite numerous efforts to optimize the design of the PGM-Index, few systematically study the root causes of performance mismatches observed in practice. In this paper, we explore two key research questions. Q1 : Why are PGM-Indexes theoretically effective? and Q2 : Why do PGM-Indexes underperform in practice? For Q1 , we show that for a set of 𝑁 sorted keys, the PGM-Index can achieve a lookup time of 𝑂 ( log log 𝑁 ) while using 𝑂(𝑁) space. For Q2 , we identify that querying PGM-Indexes is highly memory-bound, where the internal index search operations often become the bottleneck. To fill the performance gap, we propose PGM++, a simple yet effective extension to the original PGM-Index that employs a mixture of different search strategies, with hyper-parameters automatically tuned through a cost model calibrated by theoretical findings. Extensive experiments show that, at comparable space costs, PGM++ speeds up index lookup queries by up to 2.31 Γ— and 1.56 Γ— when compared to the original PGM-Index and SOTA baselines.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy