go back
go back
Volume 18, No. 12
Can Surrogate Keys Negatively Impact Data Quality?
Abstract
Surrogate keys are now extensively utilized by database designers to implement keys in SQL tables. They are straightforward, easy to understand, enable efficient access, and are often considered a sufficient guarantee of data integrity despite lacking any real-world semantic meaning. In spite of all their benefits, one might wonder whether surrogate keys can negatively impact data quality. IT developers who rely exclusively on surrogate keys when designing database schemas may be tempted to not encode natural keys, as they are perceived as complex to manage at the application level. In such settings, surrogate keys allow the presence of so-called artificial unicity , a complex form of redundancy that can be propagated through foreign keys, and other underlying data-quality issues. In the presence of artificial unicity, most data cleaning techniques, especially unsupervised, are likely to fail, making data preparation and analytics very challenging. For relational databases implemented with surrogate keys but no natural keys, we developed RED2Hunt (RElational Databases REDundancy Hunting), a human-in-the-loop framework for identifying hidden redundancy and, if problems occur, clean the database. The framework was implemented on top of PostgreSQL within an eponym web-based platform to guide the expert through its application. In this paper, we present a demonstration of the RED2Hunt tool through three interactive scenarios on a polluted instance of the publicly available Perfect Pet database. During the demonstration, the visitor can take on one of two roles in the Perfect Pet database: a domain expert or a data scientist. As a domain expert, she will interact with RED2Hunt, for example to elicit natural keys, from simple yet very intuitive visualizations of tables’ attributes. As a data scientist, she will explore two simple scenarios—executing SQL queries or applying learning models—on both the initial and cleaned databases to grasp the benefits of the approach.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy