go back
go back
Volume 18, No. 11
Relational Data Models for Genetic VCF data
Abstract
The Variant Call Format (VCF) and its binary counterpart (BCF) are commonly used in bioinformatics for storing gene sequence data. While VCF oles provide compact storage, they require specioc tools and scripts for querying, thereby missing the rich functionality arsenal of database management systems and their potential for integration in multiomics pipelines. In this paper, we leverage Relational Database Management Systems (RDBMS) to enhance eociency and nexibility in storing and querying large-scale genetic datasets. We map the VCF ole structure to narrow, wide, and array-based data models that are further reoned using JSON data structures, resulting in eight data models. Our experimental evaluation shows that RDBMS provide competitive performance in comparison with specialized state-of-the-art tools while making full-nedged database capabilities available for genetic data analysis.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy