论文标题
用于强大优化问题的分层抽样
Layered Sampling for Robust Optimization Problems
论文作者
论文摘要
在现实世界中,我们的数据集通常包含离群值。此外,离群值可能会严重影响最终的机器学习结果。大多数现有用于处理离群值的算法具有较高的时间复杂性(例如二次或立方复杂性)。 {\ em CoreSet}是压缩数据以加快优化算法的流行方法。但是,当前的核心方法无法轻松扩展以使用异常值处理案例。在本文中,我们提出了一种新的核心技术变体{\ em层次采样},以处理两个基本的强大优化问题:{\ em $ k $ -Median/含有异常值的聚类}和{\ em em linareal Repression}和Outliers}。这种新的核心方法特别适合加快迭代算法(通常在局部范围内改善解决方案),以适应这些可靠的优化问题。此外,我们的方法在实践中很容易实施。我们预计我们的分层抽样框架将适用于其他强大的优化问题。
In real world, our datasets often contain outliers. Moreover, the outliers can seriously affect the final machine learning result. Most existing algorithms for handling outliers take high time complexities (e.g. quadratic or cubic complexity). {\em Coreset} is a popular approach for compressing data so as to speed up the optimization algorithms. However, the current coreset methods cannot be easily extended to handle the case with outliers. In this paper, we propose a new variant of coreset technique, {\em layered sampling}, to deal with two fundamental robust optimization problems: {\em $k$-median/means clustering with outliers} and {\em linear regression with outliers}. This new coreset method is in particular suitable to speed up the iterative algorithms (which often improve the solution within a local range) for those robust optimization problems. Moreover, our method is easy to be implemented in practice. We expect that our framework of layered sampling will be applicable to other robust optimization problems.