go back

Volume 18, No. 11

Federated and Balanced Clustering for High-dimensional Data

Authors:
Yushuai Ji, Shengkun Zhu, Shixun Huang, Zepeng Liu, Sheng Wang, Zhiyong Peng

Abstract

Balanced π‘˜ -means ensures representative centroids by forming equal-sized clusters, but struggles with slow clustering of massive distributed attributes and data-sharing restrictions. A common approach is adapting it to a vertical federated learning (VFL) framework, preventing raw data exposure by only intermediate result exchange and accelerating clustering via parallelism, yet it remains unexplored. In this paper, we propose a t ime-efficient, f e derated, and b alanced π‘˜ - means algorithm, called Teb-means , to bridge the gap. We first formulate the balanced π‘˜ -means problem as a trace maximization problem (TMP) and propose an efficient coordinatewise optimization (CO) scheme to solve it. We then integrate TMP and CO into the VFL framework by demonstrating that TMP can be decomposed into multiple subproblems based on each party’s data, which can be solved using CO while exchanging only intermediate results. Notably, we build a trade-off between utility and communication efficiency by designing a greedy block-based strategy for CO (GBCO). Our theoretical analysis shows that Teb-means achieves linear time complexity on each client, and our communication round is constant in the mild condition. Experiments show that Teb-means is on average 12.18 Γ— faster than other balanced clustering algorithms that can be federated, while achieving better balance without disrupting the cluster structure.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy