论文标题

字典匹配的图案掩盖

Pattern Masking for Dictionary Matching

论文作者

Charalampopoulos, Panagiotis, Chen, Huiping, Christen, Peter, Loukides, Grigorios, Pisanti, Nadia, Pissis, Solon P., Radoszewski, Jakub

论文摘要

在用于字典匹配(PMDM)问题的图案掩盖中,我们得到了$ d $ strings的字典$ \ mathcal {d} $,每个长度$ \ ell $,一个查询字符串$ q $ lengthy $ \ ell $和一个正整数$ z $ $ k \ subseteq \ {1,\ ldots,\ ell \} $,因此,如果$ q [i] $,则为k $中的所有$ i \,由通配符代替,然后由$ q $匹配至少$ z $ strings,从$ \ m nathcal {d} $中。 PMDM问题在于大规模现实世界中的两个重要应用程序的核心:记录包含敏感信息的数据库和查询术语删除的链接。在这两个应用程序中,求解PMDM允许提供数据实用性保证,而不是现有方法。 我们首先通过减少了众所周知的$ k $确定问题来表明,即使是二进制字母的字符串,PMDM问题的决策版本也是NP的完整问题。我们提出了PMDM的数据结构,该数据结构在$ \ Mathcal {d} $上回答$ \ MATHCAL {o}(2^{\ ell/2}(2^{\ ell/2}(2^{\ ell/2}+τ)\ ell)$,并且需要空间$ \ MATHCAL {O}(2^{\ ell} d^2/τ^2+2^{\ ell/2} d)$,对于任何参数$τ\ in [1,d] $。我们还从更实际的角度解决了问题。我们显示$ \ Mathcal {o}(((d \ ell)^{k/3}+d \ ell)$ - 时间和$ \ Mathcal {o}(d \ ell)$ - 如果$ k = | k = | k | k | k | k | k | k | k | k |我们将确切的算法概括为同时掩盖多个查询字符串。我们通过显示PMDM和最小工会问题之间的双向多项式时间减少来补充我们的结果[Chlamtáč等,Soda 2017]。这给出了一个多项式时$ \ MATHCAL {O}(d^{1/4+ε})$ - PMDM的近似算法,在合理的复杂性下,该算法很紧。

In the Pattern Masking for Dictionary Matching (PMDM) problem, we are given a dictionary $\mathcal{D}$ of $d$ strings, each of length $\ell$, a query string $q$ of length $\ell$, and a positive integer $z$, and we are asked to compute a smallest set $K\subseteq\{1,\ldots,\ell\}$, so that if $q[i]$, for all $i\in K$, is replaced by a wildcard, then $q$ matches at least $z$ strings from $\mathcal{D}$. The PMDM problem lies at the heart of two important applications featured in large-scale real-world systems: record linkage of databases that contain sensitive information, and query term dropping. In both applications, solving PMDM allows for providing data utility guarantees as opposed to existing approaches. We first show, through a reduction from the well-known $k$-Clique problem, that a decision version of the PMDM problem is NP-complete, even for strings over a binary alphabet. We present a data structure for PMDM that answers queries over $\mathcal{D}$ in time $\mathcal{O}(2^{\ell/2}(2^{\ell/2}+τ)\ell)$ and requires space $\mathcal{O}(2^{\ell}d^2/τ^2+2^{\ell/2}d)$, for any parameter $τ\in[1,d]$. We also approach the problem from a more practical perspective. We show an $\mathcal{O}((d\ell)^{k/3}+d\ell)$-time and $\mathcal{O}(d\ell)$-space algorithm for PMDM if $k=|K|=\mathcal{O}(1)$. We generalize our exact algorithm to mask multiple query strings simultaneously. We complement our results by showing a two-way polynomial-time reduction between PMDM and the Minimum Union problem [Chlamtáč et al., SODA 2017]. This gives a polynomial-time $\mathcal{O}(d^{1/4+ε})$-approximation algorithm for PMDM, which is tight under plausible complexity conjectures.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源