论文标题

配置文件熵:离散分布的可学习性和可压缩性的基本措施

Profile Entropy: A Fundamental Measure for the Learnability and Compressibility of Discrete Distributions

论文作者

Hao, Yi, Orlitsky, Alon

论文摘要

样本的轮廓是其符号频率的多序。我们表明,对于离散分布的样本,剖面熵是统一估计,推理和压缩概念的基本措施。具体而言,轮廓熵a)确定估计分布相对于最佳自然估计器的速度; b)表征与任何标签不变分布收集相比最佳估计器相比,推断所有对称性能的速率; c)用作轮廓压缩的极限,我们为此得出了最佳的近线性块和顺序算法。为了进一步了解剖面熵的理解,我们研究了其属性,为近似值的算法提供了算法,并确定了其对众多结构分布家族的幅度。

The profile of a sample is the multiset of its symbol frequencies. We show that for samples of discrete distributions, profile entropy is a fundamental measure unifying the concepts of estimation, inference, and compression. Specifically, profile entropy a) determines the speed of estimating the distribution relative to the best natural estimator; b) characterizes the rate of inferring all symmetric properties compared with the best estimator over any label-invariant distribution collection; c) serves as the limit of profile compression, for which we derive optimal near-linear-time block and sequential algorithms. To further our understanding of profile entropy, we investigate its attributes, provide algorithms for approximating its value, and determine its magnitude for numerous structural distribution families.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源