论文标题

玛丽亚:多重分配$ r $索引与汇总

MARIA: Multiple-alignment $r$-index with aggregation

论文作者

Goga, Adrián, Baláž, Andrej, Petescia, Alessia, Gagie, Travis

论文摘要

现在存在紧凑的索引,可以有效地列出由数千个基因组组成的数据集中的所有发生模式的出现,甚至可以有效地列出所有模式最大匹配(MEM)的所有发生。但是,除非我们很幸运,否则模式仅针对几个基因组,但是,我们可能会被数百场比赛(甚至数百个MEM)所淹没,只是发现大多数或全部的比赛都是针对多个对齐中相同几列的子字符串。为了解决这个问题,在本文中,我们提出了一个简单而紧凑的数据索引玛丽亚,该指标存储了多个对齐,以至于鉴于一种模式的匹配位置(或模式的mem或其他子字符串)及其长度及其长度,我们可以快速列出匹配开始的多个对齐的所有不同列。

There now exist compact indexes that can efficiently list all the occurrences of a pattern in a dataset consisting of thousands of genomes, or even all the occurrences of all the pattern's maximal exact matches (MEMs) with respect to the dataset. Unless we are lucky and the pattern is specific to only a few genomes, however, we could be swamped by hundreds of matches -- or even hundreds per MEM -- only to discover that most or all of the matches are to substrings that occupy the same few columns in a multiple alignment. To address this issue, in this paper we present a simple and compact data index MARIA that stores a multiple alignment such that, given the position of one match of a pattern (or a MEM or other substring of a pattern) and its length, we can quickly list all the distinct columns of the multiple alignment where matches start.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源