论文标题
关于在差异化机器学习中体系结构和功能选择的重要性
On the Importance of Architecture and Feature Selection in Differentially Private Machine Learning
论文作者
论文摘要
我们研究典型的工作流程中的陷阱,用于私人机器学习。在选择要使用的功能工程操作,选择哪些功能或要使用的神经网络体系结构时,使用差异私人学习算法的使用,而无需考虑差异隐私(DP)噪声的影响 - 产生过于复杂且性能较差的模型。换句话说,通过预测DP噪声的影响,可以培训一个更简单,更准确的替代模型,以获得相同的隐私保证。我们通过理论和实验系统地研究了这种现象。在理论方面,我们提供了一个解释性框架,并证明了这种现象自然是由于添加噪声来满足差异隐私而产生的。在实验方面,我们演示了这种现象如何使用各种数据集,模型,任务和神经网络体系结构在实践中表现出来。我们还分析了导致该问题的因素,并将我们的实验洞察力提取到具有不同隐私训练模型时可以遵循的具体外卖的洞察力。最后,我们提出了用于功能选择和神经网络体系结构搜索的隐私感知算法。我们分析了它们的差异隐私属性并经验评估它们。
We study a pitfall in the typical workflow for differentially private machine learning. The use of differentially private learning algorithms in a "drop-in" fashion -- without accounting for the impact of differential privacy (DP) noise when choosing what feature engineering operations to use, what features to select, or what neural network architecture to use -- yields overly complex and poorly performing models. In other words, by anticipating the impact of DP noise, a simpler and more accurate alternative model could have been trained for the same privacy guarantee. We systematically study this phenomenon through theory and experiments. On the theory front, we provide an explanatory framework and prove that the phenomenon arises naturally from the addition of noise to satisfy differential privacy. On the experimental front, we demonstrate how the phenomenon manifests in practice using various datasets, types of models, tasks, and neural network architectures. We also analyze the factors that contribute to the problem and distill our experimental insights into concrete takeaways that practitioners can follow when training models with differential privacy. Finally, we propose privacy-aware algorithms for feature selection and neural network architecture search. We analyze their differential privacy properties and evaluate them empirically.