论文标题
不对称数字系统的压缩最优性
Compression Optimality of Asymmetric Numeral Systems
论文作者
论文摘要
压缩也称为熵编码具有丰富而悠久的历史。但是,最近对多媒体互联网应用程序的爆炸(例如,电视会议和视频流)更新了对快速压缩的兴趣,该兴趣也尽可能地挤出了尽可能多的冗余。 2009年,Jarek Duda发明了他的不对称数字系统(ANS)。除了精美的数学结构外,它非常有效,并提供了剩余冗余非常低的压缩。 ANS适用于任何符号源统计信息。此外,ANS已成为IT行业中首选的压缩算法。但是,设计ANS实例需要随机选择其符号传播功能。因此,每个ANS实例提供的压缩率略有不同。 该论文研究了ANS的压缩最优性。它表明,对于任何概率分布的符号来源,ANS是最佳的(即编码和源的熵相等)。我们使用马尔可夫链来计算ANS状态概率。这使我们能够精确确定ANS压缩率。我们提出了两种算法,用于查找具有高压率的ANS实例。第一个探讨了状态概率近似值,以便选择具有更好的压缩率的ANS实例。第二算法是一种概率。它发现了ANS实例,其压缩率可以根据需要将其接近最佳速率。这是以牺牲内部随机``硬币''折腾的数字$θ$而完成的。算法复杂度为$ {\ cal o}(θl^3)$,其中$ l $是ANS状态的数量。如果我们使用快速矩阵反转,则可以将复杂性降低为$ {\ cal o}(θl\ log {l})$。如果算法是在量子计算机上实现的,则其复杂性变为$ {\ cal o}(θ(\ log {l})^3)$。
Compression also known as entropy coding has a rich and long history. However, a recent explosion of multimedia Internet applications (such as teleconferencing and video streaming for instance) renews an interest in fast compression that also squeezes out as much redundancy as possible. In 2009 Jarek Duda invented his asymmetric numeral system (ANS). Apart from a beautiful mathematical structure, it is very efficient and offers compression with a very low residual redundancy. ANS works well for any symbol source statistics. Besides, ANS has become a preferred compression algorithm in the IT industry. However, designing ANS instance requires a random selection of its symbol spread function. Consequently, each ANS instance offers compression with a slightly different compression rate. The paper investigates compression optimality of ANS. It shows that ANS is optimal (i.e. the entropies of encoding and source are equal) for any symbol sources whose probability distribution is described by natural powers of 1/2. We use Markov chains to calculate ANS state probabilities. This allows us to determine ANS compression rate precisely. We present two algorithms for finding ANS instances with high compression rates. The first explores state probability approximations in order to choose ANS instances with better compression rates. The second algorithm is a probabilistic one. It finds ANS instances, whose compression rate can be made as close to the best rate as required. This is done at the expense of the number $θ$ of internal random ``coin'' tosses. The algorithm complexity is ${\cal O}(θL^3)$, where $L$ is the number of ANS states. The complexity can be reduced to ${\cal O}(θL\log{L})$ if we use a fast matrix inversion. If the algorithm is implemented on quantum computer, its complexity becomes ${\cal O}(θ(\log{L})^3)$.