go back
go back
Volume 18, No. 7
Efficient Discovery of Relaxed Functional Dependencies
Abstract
This paper studies the discovery of relaxed functional dependencies ( RFDs ). We consider RFDs that relax restrictions in both value equality and constraint satisfaction: treating values as equal if their distance is less than a given similarity threshold, and considering RFDs with violations below a given error threshold as valid. As a highly non-trivial extension of the row-based approach to functional dependency ( FD ) discovery, we present the first algorithm capable of discovering all valid and minimal RFDs . We extend the structure called “ difference-set ” for predicates that are combinations of attributes and similarity thresholds. We present an efficient method for difference-set construction, incorporating optimizations for both time and space complexity. When inferring RFDs from difference-sets, we enumerate RFDs based on the subsumption relationship of their right-hand-side predicates to share computations. An extensive experimental evaluation verifies that the proposed discovery algorithm is faster than baseline methods up to orders of magnitude and effective in finding hidden FDs from dirty data.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy