论文标题

系统发育树的信息几何形状

Information geometry for phylogenetic trees

论文作者

Garba, Maryam K., Nye, Tom M. W., Lueg, Jonas, Huckemann, Stephan F.

论文摘要

我们提出了一个新的系统发育树的空间,我们称之为Wald Space。动机是开发一个适合于系统发育的统计分析的空间,但是基于几何形状,基于比现有空间更具生物学原则性假设的几何形状:在瓦尔德空间中,如果树木在遗传序列数据上引起类似的分布,则树木很近。作为点集,Wald Space包含了先前开发的Billera-Holmes-Vogtmann(BHV)树空间。它还包含断开的森林,例如边缘产品(EP)空间,但没有EP空间的某些奇异性。我们研究了Wald空间上的两个相关几何。首先是每棵树上由两态对称的马尔可夫替代过程引起的特征分布的渔民信息指标。无限的指标与kullback -leibler的差异成正比,或者如我们所示,与f -ddivergence相同。第二几何形状类似地获得,但使用每棵树上使用相关的连续值高斯过程,并且可以将其视为用于协方差矩阵的仿射不变度量的痕量度量。我们得出了一种梯度下降算法,以从协方差矩阵到WALD空间的环境空间投射。对于这两种几何形状,我们都会得出计算方法来计算多项式时间的测量学,并在数值上表明两个信息几何(离散和连续)非常相似。特别是地球化学是外部近似的。与BHV几何形状的比较表明,我们的规范和生物学动机空间大不相同。

We propose a new space of phylogenetic trees which we call wald space. The motivation is to develop a space suitable for statistical analysis of phylogenies, but with a geometry based on more biologically principled assumptions than existing spaces: in wald space, trees are close if they induce similar distributions on genetic sequence data. As a point set, wald space contains the previously developed Billera-Holmes-Vogtmann (BHV) tree space; it also contains disconnected forests, like the edge-product (EP) space but without certain singularities of the EP space. We investigate two related geometries on wald space. The first is the geometry of the Fisher information metric of character distributions induced by the two-state symmetric Markov substitution process on each tree. Infinitesimally, the metric is proportional to the Kullback-Leibler divergence, or equivalently, as we show, any to f -divergence. The second geometry is obtained analogously but using a related continuous-valued Gaussian process on each tree, and it can be viewed as the trace metric of the affine-invariant metric for covariance matrices. We derive a gradient descent algorithm to project from the ambient space of covariance matrices to wald space. For both geometries we derive computational methods to compute geodesics in polynomial time and show numerically that the two information geometries (discrete and continuous) are very similar. In particular geodesics are approximated extrinsically. Comparison with the BHV geometry shows that our canonical and biologically motivated space is substantially different.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源