论文标题

自适应学习可压缩字符串

Adaptive Learning of Compressible Strings

论文作者

Fici, Gabriele, Prezza, Nicola, Venturini, Rossano

论文摘要

假设Oracle知道我们未知的字符串$ S $,并且我们想确定。 Oracle可以回答“是$ s $的$ S $?”的查询。 1995年,Skiena和Sundaram表明,在最坏的情况下,任何算法都需要询问Oracle $σn/4 -o(n)$ Queries,以便能够重建隐藏的字符串,其中$σ$是$ S $和$ n $的字母的大小,并提供了$ s $ n $的长度,并提供了algorithm的费用。 $(σ-1)n+o(σ\ sqrt {n})$查询重建$ s $。我们论文的主要贡献是在琴弦可压缩的情况下改善上述上限。我们首先提出了一种通用算法,给定一个(可计算的)压缩机将字符串压缩到$τ$位,执行$ q = o(τ)$ substring查询;但是,该算法在指数时间内运行。因此,本文的第二部分集中于更时效的算法,其查询数量受特定的可压缩度量的限制。我们首先证明,可以使用$ q = o(RLE(σ+ \ log \ frac {n} {rle} {rle} {rle} {rle} {rle} {rle} {rle} {rle} {rle} {rle}))$ substring查询在线性时段和空间中的substring查询。然后,我们提出了一种算法,该算法在o(σg\ log n)$ substring查询中花费$ q \,并以$ o(n(\ log n + \ logσ) + q)$ time运行,其中$ g $是$ g $的最小直线程序的大小。

Suppose an oracle knows a string $S$ that is unknown to us and that we want to determine. The oracle can answer queries of the form "Is $s$ a substring of $S$?". In 1995, Skiena and Sundaram showed that, in the worst case, any algorithm needs to ask the oracle $σn/4 -O(n)$ queries in order to be able to reconstruct the hidden string, where $σ$ is the size of the alphabet of $S$ and $n$ its length, and gave an algorithm that spends $(σ-1)n+O(σ\sqrt{n})$ queries to reconstruct $S$. The main contribution of our paper is to improve the above upper-bound in the context where the string is compressible. We first present a universal algorithm that, given a (computable) compressor that compresses the string to $τ$ bits, performs $q=O(τ)$ substring queries; this algorithm, however, runs in exponential time. For this reason, the second part of the paper focuses on more time-efficient algorithms whose number of queries is bounded by specific compressibility measures. We first show that any string of length $n$ over an integer alphabet of size $σ$ with $rle$ runs can be reconstructed with $q=O(rle (σ+ \log \frac{n}{rle}))$ substring queries in linear time and space. We then present an algorithm that spends $q \in O(σg\log n)$ substring queries and runs in $O(n(\log n + \log σ)+ q)$ time using linear space, where $g$ is the size of a smallest straight-line program generating the string.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源