论文标题
参考pangenome图的设计和构建
The design and construction of reference pangenome graphs
论文作者
论文摘要
测序技术的最新进展使各个基因组的组装达到参考质量。如何整合来自同一物种的多个基因组并使生物学家可以访问的综合表示仍然是一个开放的挑战。在这里,我们提出了一种基于图的数据模型和相关格式,以表示多个基因组,同时保留线性参考基因组的坐标。我们在Mighaph Toolkit中实现了思想,并证明我们可以有效地构造pangenome图,并紧凑地编码当前参考基因组中缺少的成千上万种结构变体。
The recent advances in sequencing technologies enables the assembly of individual genomes to the reference quality. How to integrate multiple genomes from the same species and to make the integrated representation accessible to biologists remain an open challenge. Here we propose a graph-based data model and associated formats to represent multiple genomes while preserving the coordinate of the linear reference genome. We implemented our ideas in the minigraph toolkit and demonstrate that we can efficiently construct a pangenome graph and compactly encode tens of thousands of structural variants missing from the current reference genome.