论文标题
在滑动窗口中带有离群值的K-中心聚类
k-Center Clustering with Outliers in Sliding Windows
论文作者
论文摘要
公制$ k $ - 中心聚类是一种基本的无监督学习原始学习。尽管使用了广泛使用,但该原始性受数据中噪声的影响很大,因此更明智的变体寻求最佳解决方案,该解决方案无视数据集中的给定数字$ z $,称为离群值。我们为在滑动窗口设置下的流模型中的这种重要变体提供有效的算法,在每个时间步骤中,要群集的数据集是最新数据项的窗口$ w $。我们的算法实现了$ O(1)$近似,并且非常需要以$ k+z $为单位的工作内存线性,而仅在$ | w | $中仅对对数。作为副产品,我们展示了如何估计窗口$ W $的有效直径,这是衡量窗口点的扩展的量度,无视给定的嘈杂距离的给定部分。我们还提供了我们理论结果实际生存能力的实验证据。
Metric $k$-center clustering is a fundamental unsupervised learning primitive. Although widely used, this primitive is heavily affected by noise in the data, so that a more sensible variant seeks for the best solution that disregards a given number $z$ of points of the dataset, called outliers. We provide efficient algorithms for this important variant in the streaming model under the sliding window setting, where, at each time step, the dataset to be clustered is the window $W$ of the most recent data items. Our algorithms achieve $O(1)$ approximation and, remarkably, require a working memory linear in $k+z$ and only logarithmic in $|W|$. As a by-product, we show how to estimate the effective diameter of the window $W$, which is a measure of the spread of the window points, disregarding a given fraction of noisy distances. We also provide experimental evidence of the practical viability of our theoretical results.