go back
go back
Volume 18, No. 11
Federated and Balanced Clustering for High-dimensional Data
Abstract
Balanced π -means ensures representative centroids by forming equal-sized clusters, but struggles with slow clustering of massive distributed attributes and data-sharing restrictions. A common approach is adapting it to a vertical federated learning (VFL) framework, preventing raw data exposure by only intermediate result exchange and accelerating clustering via parallelism, yet it remains unexplored. In this paper, we propose a t ime-efficient, f e derated, and b alanced π - means algorithm, called Teb-means , to bridge the gap. We first formulate the balanced π -means problem as a trace maximization problem (TMP) and propose an efficient coordinatewise optimization (CO) scheme to solve it. We then integrate TMP and CO into the VFL framework by demonstrating that TMP can be decomposed into multiple subproblems based on each partyβs data, which can be solved using CO while exchanging only intermediate results. Notably, we build a trade-off between utility and communication efficiency by designing a greedy block-based strategy for CO (GBCO). Our theoretical analysis shows that Teb-means achieves linear time complexity on each client, and our communication round is constant in the mild condition. Experiments show that Teb-means is on average 12.18 Γ faster than other balanced clustering algorithms that can be federated, while achieving better balance without disrupting the cluster structure.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy