论文标题
文档嵌入的句子级隐私
Sentence-level Privacy for Document Embeddings
论文作者
论文摘要
用户语言数据可以包含高度敏感的个人内容。因此,必须在从数据中学习时为用户提供强有力且可解释的隐私保证。在这项工作中,我们在单个用户文档的句子级别提出了sentdp:纯局部差别隐私。我们提出了一种新颖的技术,即DeepCandidate,该技术结合了稳健的统计和语言建模的概念,以产生高维,通用$ε$ -SENTDP文档嵌入。这可以确保文档中的任何单个句子都可以用其他任何句子代替,同时保持嵌入$ε$ - 可区分。我们的实验表明,这些私人文档嵌入对于情感分析和主题分类等下游任务以及均超过基线方法(如单词级度量DP)有用。
User language data can contain highly sensitive personal content. As such, it is imperative to offer users a strong and interpretable privacy guarantee when learning from their data. In this work, we propose SentDP: pure local differential privacy at the sentence level for a single user document. We propose a novel technique, DeepCandidate, that combines concepts from robust statistics and language modeling to produce high-dimensional, general-purpose $ε$-SentDP document embeddings. This guarantees that any single sentence in a document can be substituted with any other sentence while keeping the embedding $ε$-indistinguishable. Our experiments indicate that these private document embeddings are useful for downstream tasks like sentiment analysis and topic classification and even outperform baseline methods with weaker guarantees like word-level Metric DP.