机器学习中经验风险的统计鲁棒性

论文标题

机器学习中经验风险的统计鲁棒性

Statistical Robustness of Empirical Risks in Machine Learning

论文作者

Guo, Shaoyan, Xu, Huifu, Zhang, Liwei

论文摘要

本文研究了繁殖内核希尔伯特空间（RKHS）中经验风险的融合。现有研究中的一个常规假设是，经验培训数据不包含任何噪音，但在某些实际情况下可能无法满足。因此，现有的收敛结果不能保证，当数据包含一些噪声时，基于经验数据的经验风险是否可靠。在本文中，我们在几个步骤中填补了空白。首先，我们得出了适度的条件，在这些条件下，预期风险稳定（连续）与基础随机变量的概率分布的微小扰动稳定变化，并证明了成本函数和内核如何影响稳定性。其次，我们使用Prokhorov Metric和Kantorovich Metric进行了基于纯数据和污染数据的预期最佳损失统计估计量的定律之间的差异，并得出了一些定性和定量的统计鲁棒性结果。第三，我们确定统计估计量均匀一致的适当指标。这些结果为分析渐近收敛性并在许多著名的机器学习模型中检查统计估计器的可靠性提供了理论基础。

This paper studies convergence of empirical risks in reproducing kernel Hilbert spaces (RKHS). A conventional assumption in the existing research is that empirical training data do not contain any noise but this may not be satisfied in some practical circumstances. Consequently the existing convergence results do not provide a guarantee as to whether empirical risks based on empirical data are reliable or not when the data contain some noise. In this paper, we fill out the gap in a few steps. First, we derive moderate sufficient conditions under which the expected risk changes stably (continuously) against small perturbation of the probability distribution of the underlying random variables and demonstrate how the cost function and kernel affect the stability. Second, we examine the difference between laws of the statistical estimators of the expected optimal loss based on pure data and contaminated data using Prokhorov metric and Kantorovich metric and derive some qualitative and quantitative statistical robustness results. Third, we identify appropriate metrics under which the statistical estimators are uniformly asymptotically consistent. These results provide theoretical grounding for analysing asymptotic convergence and examining reliability of the statistical estimators in a number of well-known machine learning models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题