论文标题

重复数据培训数据减轻语言模型中的隐私风险

Deduplicating Training Data Mitigates Privacy Risks in Language Models

论文作者

Kandpal, Nikhil, Wallace, Eric, Raffel, Colin

论文摘要

过去的工作表明,大型语言模型容易受到隐私攻击的影响,在这种攻击中,对手从训练有素的模型中生成序列,并检测哪些序列是从训练集中记住的。在这项工作中,我们表明,这些攻击的成功很大程度上是由于常用的网络绑带培训集中的重复。我们首先表明语言模型再生训练序列的速率与训练集中的序列计数相关。例如,在训练数据中存在10次的序列平均生成的频率比仅存在一次的序列要高约1000倍。接下来,我们显示,现有的用于检测记忆序列的方法在非复杂的训练序列上具有接近看法的准确性。最后,我们发现,在应用方法重复训练数据后,语言模型在这些类型的隐私攻击方面更加安全。综上所述,我们的结果激发了人们对隐私敏感应用程序中重复数据删除的越来越多的重点,并重新评估了现有隐私攻击的实用性。

Past work has shown that large language models are susceptible to privacy attacks, where adversaries generate sequences from a trained model and detect which sequences are memorized from the training set. In this work, we show that the success of these attacks is largely due to duplication in commonly used web-scraped training sets. We first show that the rate at which language models regenerate training sequences is superlinearly related to a sequence's count in the training set. For instance, a sequence that is present 10 times in the training data is on average generated ~1000 times more often than a sequence that is present only once. We next show that existing methods for detecting memorized sequences have near-chance accuracy on non-duplicated training sequences. Finally, we find that after applying methods to deduplicate training data, language models are considerably more secure against these types of privacy attacks. Taken together, our results motivate an increased focus on deduplication in privacy-sensitive applications and a reevaluation of the practicality of existing privacy attacks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源