论文标题
基于聚合物的数据存储中的插入和删除校正
Insertion and Deletion Correction in Polymer-based Data Storage
论文作者
论文摘要
基于合成聚合物的存储似乎是一个特别有前途的候选人,可以帮助应付对档案存储需求不断增长的需求。它涉及设计不同质量的分子以表示相应的位$ \ {0,1 \} $,然后合成一个分子单元的聚合物,该聚合物反映了信息字符串中位的顺序。读取存储的数据需要使用串联质谱仪,将聚合物片段片段片段片段化为较短的子字符串,并提供相应的质量,从中可以推断出相关子带的\ emph {composition},即$ 1 $ s和$ 0 $ s的数量。先前的工作已经处理了从所有可能的构图集的集合的唯一字符串重建问题,称为\ emph {composition MultiSet}。这是通过确定始终允许唯一重建的字符串长度或通过制定编码约束以方便所有字符串长度来实现的。此外,还提出了在读出过程中因不精确的破碎引起的替换错误的错误纠正方案。这项工作以这项研究为基础,它通过概括了先前考虑的错误模型,主要仅限于构图的替代。为此,我们定义了考虑插入虚假构图和现有删除的新错误模型,从而损坏了构图多动物。我们分析了Pattabiraman \ emph {等}提出的重建代码手册是否确实对此类错误是可靠的,如果没有,则提出了新的编码约束来解决此问题。
Synthetic polymer-based storage seems to be a particularly promising candidate that could help to cope with the ever-increasing demand for archival storage requirements. It involves designing molecules of distinct masses to represent the respective bits $\{0,1\}$, followed by the synthesis of a polymer of molecular units that reflects the order of bits in the information string. Reading out the stored data requires the use of a tandem mass spectrometer, that fragments the polymer into shorter substrings and provides their corresponding masses, from which the \emph{composition}, i.e. the number of $1$s and $0$s in the concerned substring can be inferred. Prior works have dealt with the problem of unique string reconstruction from the set of all possible compositions, called \emph{composition multiset}. This was accomplished either by determining which string lengths always allow unique reconstruction, or by formulating coding constraints to facilitate the same for all string lengths. Additionally, error-correcting schemes to deal with substitution errors caused by imprecise fragmentation during the readout process, have also been suggested. This work builds on this research by generalizing previously considered error models, mainly confined to substitution of compositions. To this end, we define new error models that consider insertions of spurious compositions and deletions of existing ones, thereby corrupting the composition multiset. We analyze if the reconstruction codebook proposed by Pattabiraman \emph{et al.} is indeed robust to such errors, and if not, propose new coding constraints to remedy this.