论文标题

基因组压缩在解码器处读取对齐

Genomic Compression with Read Alignment at the Decoder

论文作者

Gershon, Yotam, Cassuto, Yuval

论文摘要

我们为给出的基因组数据提出了一种新的压缩方案,作为称为读取的序列片段。该方案仅在解码器侧使用参考基因​​组,从而使编码器免于存储参考的负担并执行计算昂贵的对齐操作。该方案的主要成分是多层代码构建,向解码器传递足够的信息以使读取,纠正其与参考的区别,验证其重建以及正确的重建错误。该方法的核心是通过解码器侧信息进行分布式源编码的众所周知的概念,该信息通过广义征收代码构建加强,从而有效地嵌入了可靠重建所需的所有信息。我们首先仅在读取和参考之间介绍替换错误的情况,然后将其扩展到以单个删除和多个替换为支持读取的方案。该扩展程序中的一个中心工具是一个新的距离指标,可以在分析上显示以提高现有距离指标的对齐性能。

We propose a new compression scheme for genomic data given as sequence fragments called reads. The scheme uses a reference genome at the decoder side only, freeing the encoder from the burdens of storing references and performing computationally costly alignment operations. The main ingredient of the scheme is a multi-layer code construction, delivering to the decoder sufficient information to align the reads, correct their differences from the reference, validate their reconstruction, and correct reconstruction errors. The core of the method is the well-known concept of distributed source coding with decoder side information, fortified by a generalized-concatenation code construction enabling efficient embedding of all the information needed for reliable reconstruction. We first present the scheme for the case of substitution errors only between the reads and the reference, and then extend it to support reads with a single deletion and multiple substitutions. A central tool in this extension is a new distance metric that is shown analytically to improve alignment performance over existing distance metrics.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源