go back

Volume 18, No. 11

OmniMatch: Joinability Discovery in Data Products

Authors:
Christos Koutras, Jiani Zhang, Xiao Qin, Chuan Lei, Vassilis Ioannidis, Christos Faloutsos, George Karypis, Asterios Katsifodimos

Abstract

We propose OmniMatch , a novel joinability discovery technique, specifically tailored for the needs of data products : cohesive curated collections of tabular datasets. OmniMatch combines multiple column-pair similarity measures leveraging self-supervised Graph Neural Networks (GNNs). OmniMatch ’s GNN captures column relatedness by leveraging graph neighborhood information, significantly improving the recall of joinability discovery tasks. At the same time, OmniMatch increases its precision by augmenting its training data with negative column join examples through an automated negative example generation process. Compared to the state-of-the-art, OmniMatch exhibits up to 14% higher effectiveness in F1 score and AUC without relying on individual, user-provided thresholds for each similarity metric.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy