论文标题
具有不同隐私的定制文本消毒机制
A Customized Text Sanitization Mechanism with Differential Privacy
论文作者
论文摘要
随着隐私问题在自然语言处理(NLP)社区中受到越来越多的关注,已经提出了许多方法来消毒受到差异隐私的文本。但是,基于公制的当地差异隐私(MLDP)的最先进的文本消毒机制不适用于非金属的语义相似性措施,并且无法在隐私和效用之间实现良好的权衡。为了解决上述限制,我们根据原始的$ε$ - 差异隐私(DP)定义提出了一种新颖的自定义文本(Custext)消毒机制,该定义与任何相似度度量兼容。此外,Custext分配每个输入令牌一个定制的令牌集集,以在令牌级别提供更高级的隐私保护。在几个基准数据集上进行的广泛实验表明,与现有机制相比,Custext在隐私和公用事业之间实现了更好的权衡。该代码可在https://github.com/sai4july/custext上找到。
As privacy issues are receiving increasing attention within the Natural Language Processing (NLP) community, numerous methods have been proposed to sanitize texts subject to differential privacy. However, the state-of-the-art text sanitization mechanisms based on metric local differential privacy (MLDP) do not apply to non-metric semantic similarity measures and cannot achieve good trade-offs between privacy and utility. To address the above limitations, we propose a novel Customized Text (CusText) sanitization mechanism based on the original $ε$-differential privacy (DP) definition, which is compatible with any similarity measure. Furthermore, CusText assigns each input token a customized output set of tokens to provide more advanced privacy protection at the token level. Extensive experiments on several benchmark datasets show that CusText achieves a better trade-off between privacy and utility than existing mechanisms. The code is available at https://github.com/sai4july/CusText.