论文标题
总结分歧串序列,并应用于链式请愿书
Summarizing Diverging String Sequences, with Applications to Chain-Letter Petitions
论文作者
论文摘要
在各种情况下,对算法找到了字符串之间的最佳对齐方式,或找到一系列字符串的简约摘要,以解决各种有趣的应用程序。在本文中,我们考虑了链信,其中包含随着字母繁殖而添加的一系列签名序列。链信(单端生长,差异和突变)所表现出的特征的异常星座使它们的传播,因此既有独特又富含的相应重建问题。在这里,受这些连锁字母的启发,我们正式定义了计算一组散布串序列的最佳摘要的问题。从这些名称序列的集合中,每个序列都与代表字母的真实传播的未知树$ t $的分支相对应,我们是否可以有效,准确地重建一棵树$ t'\大约t $?在本文中,当序列的数量很少时,我们为此汇总问题提供了有效的精确算法。对于较大的序列,我们证明了硬度并提供有效的启发式算法。我们对被选为模拟真实链字母的合成数据集进行了评估,这表明我们的算法具有比以前的方法更具竞争力或更好的方法,并且它也接近在这些合成数据集中找到真正的树。
Algorithms to find optimal alignments among strings, or to find a parsimonious summary of a collection of strings, are well studied in a variety of contexts, addressing a wide range of interesting applications. In this paper, we consider chain letters, which contain a growing sequence of signatories added as the letter propagates. The unusual constellation of features exhibited by chain letters (one-ended growth, divergence, and mutation) make their propagation, and thus the corresponding reconstruction problem, both distinctive and rich. Here, inspired by these chain letters, we formally define the problem of computing an optimal summary of a set of diverging string sequences. From a collection of these sequences of names, with each sequence noisily corresponding to a branch of the unknown tree $T$ representing the letter's true dissemination, can we efficiently and accurately reconstruct a tree $T' \approx T$? In this paper, we give efficient exact algorithms for this summarization problem when the number of sequences is small; for larger sets of sequences, we prove hardness and provide an efficient heuristic algorithm. We evaluate this heuristic on synthetic data sets chosen to emulate real chain letters, showing that our algorithm is competitive with or better than previous approaches, and that it also comes close to finding the true trees in these synthetic datasets.