Workload-driven, Lazy Discovery of Data Dependencies for Query Optimization
Abstract
Effective query optimization is the backbone of relational database systems; query optimizers use all manners of tricks, from specialized auxiliary data structures over caching and batch processing to the use of secondary metadata. Nevertheless, these systems do not fully exploit the potential of data dependencies, such as functional dependencies, although dozens of dependency-based query optimization techniques exist. This disregard occurs because the required data dependencies are, in practice, hard to find and hard to maintain. This paper presents our vision of a workload-driven, lazy dependency discovery system for query optimization. We propose a lightweight approach that identifies relevant data dependency candidates based on actually executed query plans, validates the candidates dynamically against the database, and maintains the results using different strategies that also exploit concepts of columnar DBMSs. Our evaluation demonstrates the feasibility of this approach and the potential of dependency-based optimizations, using a prototypical implementation in Hyrise that implements three exemplary dependency-based optimization techniques. For example, after automatic dependency discovery, these optimizations reduce the Join Order Benchmark’s execution time by 27 %.