论文标题
放松及其支持向量机器的优化风险的理论
A Theory of the Risk for Optimization with Relaxation and its Application to Support Vector Machines
论文作者
论文摘要
在本文中,我们考虑了放松的优化,这是一个充分的范式来制作数据驱动的设计。这项工作以前是由Garatti和Campi(2019)的同一作者考虑的,该研究揭示了两个概念之间的深处联系:风险(不满足新的,样本的,限制的可能性)和复杂性(根据Paper Garatti和Campi(2019年Paper Garatti和Campi(2019)中引入的定义))。该连接被证明对应用具有深远的影响,因为它暗示可以从复杂性估计的风险,这一数量可以从数据中衡量,而无需任何数据生成机制。在目前的工作中,我们建立了新的结果。首先,我们扩大了Garatti和Campi(2019)的范围,以便采用更通用的设置,涵盖机器学习中的各种算法。然后,我们研究经典的支持向量方法 - 包括SVM(支持向量机),SVR(支持向量回归)和SVDD(支持向量数据描述),并为这些方法推广的能力得出了新的结果。所有结果对于数据集的任何有限大小都是有效的。当样本量倾向于无穷大时,我们确定了前所未有的结果,即风险接近数据样本的复杂性和基数之间的比率,而不管复杂性的值如何。
In this paper we consider optimization with relaxation, an ample paradigm to make data-driven designs. This approach was previously considered by the same authors of this work in Garatti and Campi (2019), a study that revealed a deep-seated connection between two concepts: risk (probability of not satisfying a new, out-of-sample, constraint) and complexity (according to a definition introduced in paper Garatti and Campi (2019)). This connection was shown to have profound implications in applications because it implied that the risk can be estimated from the complexity, a quantity that can be measured from the data without any knowledge of the data-generation mechanism. In the present work we establish new results. First, we expand the scope of Garatti and Campi (2019) so as to embrace a more general setup that covers various algorithms in machine learning. Then, we study classical support vector methods - including SVM (Support Vector Machine), SVR (Support Vector Regression) and SVDD (Support Vector Data Description) - and derive new results for the ability of these methods to generalize. All results are valid for any finite size of the data set. When the sample size tends to infinity, we establish the unprecedented result that the risk approaches the ratio between the complexity and the cardinality of the data sample, regardless of the value of the complexity.