go back

Volume 18, No. 8

HoliPaxos: Towards More Predictable Performance in State Machine Replication

Authors:
Zhiying Liang, Vahab Jabrayilov, Abutalib Aghayev, Aleksey Charapko

Abstract

State machine replication (SMR) algorithms ensure redundancy in critical systems and, as a result, underpin fault-tolerant distributed databases. Good SMR protocol performance is essential for capacity planning and meeting desired performance objectives. However, many implementations of popular SMR algorithms, such as MultiPaxos and Raft, have issues that make their performance unpredictable. This unpredictability often arises from certain “bolt-on” additions to core protocols, such as external failure detectors and replication log compaction. In this paper, we argue that tighter integration of such traditionally ad-hoc mechanisms with the core replication protocols can stabilize performance, making the solutions more reliable and more accessible to accurate capacity planning. Moreover, we show that these integrations can be nondisruptive for the underlying consensus algorithm, resulting in systems that preserve the simplicity and safety of traditional singleleader consensus-based SMR. To that order, we integrate the failure and slowdown detectors inside the SMR and achieve better performance and faster fail-over under various network partitions and node slowdown events. We also illustrate that tight integration of replication log management, pruning, and snapshotting can reduce memory and CPU usage while avoiding performance fluctuations associated with traditional log compaction and cleanup approaches.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy