Federated Incomplete Tabular Data Prediction with Missing Complementarity

Authors:

Yan Zhang, Shuwei Liang, Xiaoye Miao, Yangyang Wu, Jianwei Yin

Download PDF

Abstract

Tabular data is abundant and crucial across both industry and academia. Federated learning (FL) offers a promising solution for the analysis of tabular data distributed across multiple organizations, without the need to share the privacy information of each client. Existing federated tabular data prediction methods optimize performance and privacy leakage under the completeness assumption of tabular data. They are not applicable in real-world scenarios that are struggling with missing values in tabular data. In this paper, we propose a novel federated prediction framework for incomplete tabular data, named DARN , which leverages the missing complementarity to directly optimize prediction performance without relying on the imputed values. It is especially beneficial when clients exhibit heterogeneity in missing data distributions, and the pairwise observed data are complementary. Specifically, each client trains a missing distribution learning model to capture the distribution of locally incomplete data. To assist in this, we present a missing-aware transformer block with a novel missing-aware attention mechanism to represent incomplete tabular data directly. The server calculates the personalized weights of the prediction models by combining missing complementary score and observed sample size score , thereby maximizing the utility of the available data. Extensive experiments on four publicly available real-world datasets demonstrate that DARN outperforms state-of-the-art methods with 25.80% improvement in both classification and regression tasks.

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 18, No. 10

Federated Incomplete Tabular Data Prediction with Missing Complementarity

Abstract