go back
go back
Volume 18, No. 12
FDepHunter: Harnessing Negative Examples to Expose Fakes and Reveal Ghosts
Abstract
Functional dependency (FD) discovery is fundamental in data profiling. Inevitably, existing approaches can return fake FDs that hold only coincidentally. Moreover, these approaches fall short of identifying ghost FDs that would be observable in a clean dataset, but that remain undetected because of outliers in the data. We introduce an interactive method for dependency discovery that augments an Armstrong relation with additional tuples. We rely on artificially generated negative examples that emulate real-world tuples to help expose fake FDs. In addition, we rely on domain experts to confirm that positive examples indeed reflect the characteristics of the original dataset. Our tool prototype FDepHunter thus provides a novel human-in-the-loop workflow where the set of discovered FDs can be iteratively refined.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy