论文标题
概率扰动差异:一种可调节的同型同轴计,用于比较重尾分配
Probability-turbulence divergence: A tunable allotaxonometric instrument for comparing heavy-tailed categorical distributions
论文作者
论文摘要
现实世界中的复杂系统通常包括许多不同类型的元素以及元素之间更多类型的网络交互。当可以很好地测量类型的相对丰度时,我们经常观察到类型频率的重分类分布。为了比较两个系统的类型频率分布或与自身在不同时间点上的系统(同异轴学方面的一个方面)可用的概率差异。在这里,我们介绍和探索“概率扰动差异”,这是一种可调,直接,可解释的仪器,用于比较可进行正常的分类频率分布。我们在等级扰动差异(RTD)之后对概率扰动差异(PTD)进行建模。虽然概率扰动差异在应用方面的差异比等级扰动差异更为有限,但它对类型频率的变化更为敏感。我们构建同种异体仪以显示概率湍流,并结合了一种视觉上适应“独家类型”的零概率的方法,这些概率仅在一个系统中出现。我们探讨了从文献,社交媒体和生态学中获得的示例分布的比较。我们展示了概率扰动的差异是如何明确或功能概括的许多现有种类的距离和度量,包括特殊情况,包括$ l^{(p)} $ norms,Sørensen-DICE系数($ f_ {1} $ statistic)和Hellinger距离。我们讨论了与r {é} nyi和tsallis的广义熵以及生态学的多样性指数(或山丘数)的相似之处。我们对关于优化等级和概率扰动差异的优化的开放问题的想法结束了。
Real-world complex systems often comprise many distinct types of elements as well as many more types of networked interactions between elements. When the relative abundances of types can be measured well, we often observe heavy-tailed categorical distributions for type frequencies. For the comparison of type frequency distributions of two systems or a system with itself at different time points in time -- a facet of allotaxonometry -- a great range of probability divergences are available. Here, we introduce and explore `probability-turbulence divergence', a tunable, straightforward, and interpretable instrument for comparing normalizable categorical frequency distributions. We model probability-turbulence divergence (PTD) after rank-turbulence divergence (RTD). While probability-turbulence divergence is more limited in application than rank-turbulence divergence, it is more sensitive to changes in type frequency. We build allotaxonographs to display probability turbulence, incorporating a way to visually accommodate zero probabilities for `exclusive types' which are types that appear in only one system. We explore comparisons of example distributions taken from literature, social media, and ecology. We show how probability-turbulence divergence either explicitly or functionally generalizes many existing kinds of distances and measures, including, as special cases, $L^{(p)}$ norms, the Sørensen-Dice coefficient (the $F_{1}$ statistic), and the Hellinger distance. We discuss similarities with the generalized entropies of R{é}nyi and Tsallis, and the diversity indices (or Hill numbers) from ecology. We close with thoughts on open problems concerning the optimization of the tuning of rank- and probability-turbulence divergence.