go back
go back
Volume 18, No. 4
DumpKV: Learning based lifetime aware garbage collection for key value separation in LSM-tree
Abstract
Key-value separation is used in LSM-tree to store large values in separate log files to reduce write amplification but requires garbage collection to recycle invalid values. Existing LSM-tree typically adopts a static policy to recycle obsolete values, struggling to achieve low write amplification as it is challenging to predefine the static parameters for garbage collection. In this work we propose DumpKV, a learning-based lifetime-aware garbage collection mechanism which achieves lower write amplification. DumpKV trains a machine learning model based on the access history of keys and accordingly uses the lightweight model to predict the lifetime of each key, where the predicted lifetime can be used to guide the garbage collection. To reduce the interference to write throughput introduced by garbage collection, DumpKV conducts feature collection during L0-L1 compaction, leveraging the fact that LSM-tree is small under KV separation. Experimental results show that DumpKV reduces GC write size by 25.7%-53.3% in real-world workloads and 19%-65% in synthetic workloads compared to baseline key-value separation LSM-tree KV stores with small feature storage overhead.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy