go back

Volume 18, No. 10

PS-MI: Accurate, Efficient, and Private Data Valuation in Vertical Federated Learning

Authors:
Xiaokai Zhou, Xiao Yan, Fangcheng Fu, Ziwen Fu, Tieyun Qian, Yuanyuan Zhu, Qinbo Zhang, Bin Cui, Jiawei Jiang

Abstract

Vertical federated learning (VFL) trains models when multiple databases (a.k.a participants) hold different features of the same set of samples. By quantifying each participant’s contribution to model training, data valuation can prevent hitch-riders and reward the instrumental parties. However, vertical federated data valuation (VFDV) is challenging because it needs to be accurate and efficient while protecting participant data privacy. In this paper, we propose a method meeting all three requirements by using projection and sampling for mutual information estimation (thus dubbed PS-MI). In particular, we first show that the utility of a participant set (a.k.a a coalition ) can be expressed as the mutual information (MI) between their features and the target labels. MI is favorable because it does not depend on the model to train (i.e., model-agnostic ) and can be estimated via 𝑘 -nearest neighbor (KNN). To run KNN, instead of using costly homomorphic encryption to protect data privacy, we apply simple random projection to participant features before distance computation. We prove that random projection ensures differential privacy and preserves unbiased distance estimates. Since the contribution of a participant involves many coalitions, we adopt stratified sampling to reduce the number of coalitions while controlling estimation variance. To further improve efficiency, we incorporate optimizations including using locality sensitive hashing (LSH) to prune kNN candidates, batching kNN candidate checking for multiple coalitions, and adaptive early termination for utility evaluation. We compare PS-MI with 5 state-of-the-art VFDV methods. The results show that PS-MI yields higher accuracy and shorter running time than the baselines, and the maximum speedup can be 592 × .

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy