论文标题
在DNA存储中管理可靠性偏斜
Managing Reliability Skew in DNA Storage
论文作者
论文摘要
由于提供了许多重要和独特的优势,DNA正在成为一种越来越有吸引力的数据存储媒介,最著名的是前所未有的耐用性和密度。尽管该技术正在迅速发展,但读取和写入的高昂成本,DNA存储中出现的错误的高频和特殊性质对其采用构成了重大挑战。在这项工作中,我们进行了一个新的观察,即从任何类型的基于DNA的存储系统中成功恢复给定位的可能性高度取决于其在DNA分子中的物理位置。换句话说,当用作存储介质时,DNA分子的某些部分似乎比其他部分明显可靠。我们表明,DNA分子的不同部分之间的可靠性差异很大,导致高效使用错误校正资源,并且在DNA存储的背景下,常用的技术(例如不平等误差校正)等常用技术(例如不等误差校正)不能用于弥合不同位置之间的可靠性差距。然后,我们提出了两种解决问题的方法。第一种方法是一般,并且适用于任何类型的数据。它以特殊的方式将数据和ECC代码字剥离,以使误差的影响均匀地分布在不同的代码字和分子上,从而有效地降低了基础存储介质。第二种方法是针对应用特定的,并试图通过将数据映射到DNA分子上,以利用潜在的可靠性偏差,以便将需要更高可靠性的数据存储在更可靠的位置中,而需要较低可靠性的数据存储在DNA分子的不太可靠部分中。我们表明,所提出的数据映射可用于在存在高误差率的情况下实现优美的退化,或者实现DNA中近似存储的概念。
DNA is emerging as an increasingly attractive medium for data storage due to a number of important and unique advantages it offers, most notably the unprecedented durability and density. While the technology is evolving rapidly, the prohibitive cost of reads and writes, the high frequency and the peculiar nature of errors occurring in DNA storage pose a significant challenge to its adoption. In this work we make a novel observation that the probability of successful recovery of a given bit from any type of a DNA-based storage system highly depends on its physical location within the DNA molecule. In other words, when used as a storage medium, some parts of DNA molecules appear significantly more reliable than others. We show that large differences in reliability between different parts of DNA molecules lead to highly inefficient use of error-correction resources, and that commonly used techniques such as unequal error-correction cannot be used to bridge the reliability gap between different locations in the context of DNA storage. We then propose two approaches to address the problem. The first approach is general and applies to any types of data; it stripes the data and ECC codewords across DNA molecules in a particular fashion such that the effects of errors are spread out evenly across different codewords and molecules, effectively de-biasing the underlying storage medium. The second approach is application-specific, and seeks to leverage the underlying reliability bias by using application-aware mapping of data onto DNA molecules such that data that requires higher reliability is stored in more reliable locations, whereas data that needs lower reliability is stored in less reliable parts of DNA molecules. We show that the proposed data mapping can be used to achieve graceful degradation in the presence of high error rates, or to implement the concept of approximate storage in DNA.