Volume 16, No. 8

Towards Migration-Free Just-In-Case Data Archival for Future Cloud Data Lakes

Eugenio Marinelli, Yiqing Yan, Virginie Magnone, Charlotte Dumargne, Pascal Barbry, Thomas Heinis, Raja Appuswamy


Given the growing adoption of AI, cloud data lakes are facing the need to support cost-effective “just-in-case” data archival over long time periods to meet regulatory compliance requirements. Unfortunately, current media technologies suffer from fundamental issues that will soon, if not already, make cost-effective data archival infeasible. In this paper, we present a vision for redesigning the archival tier of cloud data lakes based on a novel, obsolescence-free storage medium–synthetic DNA. In doing so, we make two contributions: (i) we highlight the challenges in using DNA for data archival and list several open research problems, (ii) we outline OligoArchive-DSM (OA-DSM)–an end-to-end DNA storage pipeline that we are developing to demonstrate the feasibility of our vision.

