论文标题
差异化私人语言模型受益于公共预培训
Differentially Private Language Models Benefit from Public Pre-training
论文作者
论文摘要
语言建模是自然语言处理中的基石任务。当培训语言模型有关敏感信息时,差异隐私(DP)允许我们量化私人数据的保护程度。但是,强制执行差异隐私的培训算法通常会导致模型质量退化。我们研究学习语言模型的可行性,该语言模型通过在私人语料库上调整公共基础模型,同时保存高质量和隐私。我们发现,DP微调可以提高私人领域中语言模型的性能,从而使此类模型的培训成为可能。
Language modeling is a keystone task in natural language processing. When training a language model on sensitive information, differential privacy (DP) allows us to quantify the degree to which our private data is protected. However, training algorithms which enforce differential privacy often lead to degradation in model quality. We study the feasibility of learning a language model which is simultaneously high-quality and privacy preserving by tuning a public base model on a private corpus. We find that DP fine-tuning boosts the performance of language models in the private domain, making the training of such models possible.