具有不同隐私的定制文本消毒机制

论文标题

具有不同隐私的定制文本消毒机制

A Customized Text Sanitization Mechanism with Differential Privacy

论文作者

Chen, Huimin, Mo, Fengran, Wang, Yanhao, Chen, Cen, Nie, Jian-Yun, Wang, Chengyu, Cui, Jamie

论文摘要

随着隐私问题在自然语言处理（NLP）社区中受到越来越多的关注，已经提出了许多方法来消毒受到差异隐私的文本。但是，基于公制的当地差异隐私（MLDP）的最先进的文本消毒机制不适用于非金属的语义相似性措施，并且无法在隐私和效用之间实现良好的权衡。为了解决上述限制，我们根据原始的$ε$ - 差异隐私（DP）定义提出了一种新颖的自定义文本（Custext）消毒机制，该定义与任何相似度度量兼容。此外，Custext分配每个输入令牌一个定制的令牌集集，以在令牌级别提供更高级的隐私保护。在几个基准数据集上进行的广泛实验表明，与现有机制相比，Custext在隐私和公用事业之间实现了更好的权衡。该代码可在https://github.com/sai4july/custext上找到。

As privacy issues are receiving increasing attention within the Natural Language Processing (NLP) community, numerous methods have been proposed to sanitize texts subject to differential privacy. However, the state-of-the-art text sanitization mechanisms based on metric local differential privacy (MLDP) do not apply to non-metric semantic similarity measures and cannot achieve good trade-offs between privacy and utility. To address the above limitations, we propose a novel Customized Text (CusText) sanitization mechanism based on the original $ε$-differential privacy (DP) definition, which is compatible with any similarity measure. Furthermore, CusText assigns each input token a customized output set of tokens to provide more advanced privacy protection at the token level. Extensive experiments on several benchmark datasets show that CusText achieves a better trade-off between privacy and utility than existing mechanisms. The code is available at https://github.com/sai4july/CusText.

下载PDF全文

下载文献需遵守相关版权规定

论文标题