论文标题

噪音重复频道的错误校正代码

Error-correcting Codes for Noisy Duplication Channels

论文作者

Tang, Yuanyuan, Farnoud, Farzad

论文摘要

由于其高数据密度和寿命,DNA成为满足增加数据存储需求的有前途的候选人。但是,与传统的存储介质相比,DNA中存储的数据受到数据存储管道中涉及的各种过程而产生的更广泛的错误范围。在本文中,我们考虑纠正给定长度k的精确和嘈杂的串联重复。精确的重复插入了该子弦之后的序列长度k的子字符串的副本,例如ACGT到ACGACGT,其中k = 3,而嘈杂的重复插入了替换噪声的副本,例如,ACGT到ACGATGT。具体而言,我们设计的代码可以纠正任何数量的精确复制和一个嘈杂的重复错误,在嘈杂的重复情况下,该副本距离原始距离为1。我们的构造依赖于恢复存储代码字的重复根。我们表征了重复错误在受影响序列的根和设计有效的代码中以纠正这些误差模式的方式。我们表明,所提出的构造在渐近上是最佳的,因为它具有与最佳代码相同的渐近率,仅纠正精确的重复。

Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data storage pipeline. In this paper, we consider correcting duplication errors for both exact and noisy tandem duplications of a given length k. An exact duplication inserts a copy of a substring of length k of the sequence immediately after that substring, e.g., ACGT to ACGACGT, where k = 3, while a noisy duplication inserts a copy suffering from substitution noise, e.g., ACGT to ACGATGT. Specifically, we design codes that can correct any number of exact duplication and one noisy duplication errors, where in the noisy duplication case the copy is at Hamming distance 1 from the original. Our constructions rely upon recovering the duplication root of the stored codeword. We characterize the ways in which duplication errors manifest in the root of affected sequences and design efficient codes for correcting these error patterns. We show that the proposed construction is asymptotically optimal, in the sense that it has the same asymptotic rate as optimal codes correcting exact duplications only.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源