（k，l） - 使用连续的动态时间翘曲的媒体聚类

论文标题

（k，l） - 使用连续的动态时间翘曲的媒体聚类

(k, l)-Medians Clustering of Trajectories Using Continuous Dynamic Time Warping

论文作者

Brankovic, Milutin, Buchin, Kevin, Klaren, Koen, Nusser, André, Popov, Aleksandr, Wong, Sampson

论文摘要

由于可用的地理空间数据的数量大量增加，并且需要以可理解的方式展示它，因此将这些数据聚类比以往任何时候都更为重要。由于群集可能包含大量对象，因此每个集群的代表都显着促进了理解聚类。依赖此类代表的聚类方法称为基于中心。在这项工作中，我们考虑了基于中心的轨迹聚类的问题。在这种情况下，集群的代表再次是轨迹。为了获得簇的紧凑表示并避免过度拟合，我们通过参数l限制了代表轨迹的复杂性。然而，这种限制使离散的距离措施之类的范围诸如动态时间扭曲（DTW）不适合。最近有关于具有连续距离度量的轨迹基于中心的聚类的工作，即Fréchet距离。尽管Fréchet距离允许限制中心的复杂性，但它也可能对离群值敏感，而平均型距离距离（例如DTW）的距离较少。为了获得轨迹聚类算法，该算法允许限制中心的复杂性，并且对离群值更强大，我们建议使用连续版本的DTW作为距离度量，我们称之为连续的动态时间扭曲（CDTW）。我们的贡献是双重的： 1。为了打击缺乏CDTW的实际算法，我们开发了一种计算它的近似算法。 2。我们在此距离度量下开发了第一个聚类算法，并展示了一种从一组轨迹中计算中心的实用方法，然后迭代地改进了中心。为了了解在CDTW下对实际数据的聚类结果的见解，我们进行了广泛的实验。

Due to the massively increasing amount of available geospatial data and the need to present it in an understandable way, clustering this data is more important than ever. As clusters might contain a large number of objects, having a representative for each cluster significantly facilitates understanding a clustering. Clustering methods relying on such representatives are called center-based. In this work we consider the problem of center-based clustering of trajectories. In this setting, the representative of a cluster is again a trajectory. To obtain a compact representation of the clusters and to avoid overfitting, we restrict the complexity of the representative trajectories by a parameter l. This restriction, however, makes discrete distance measures like dynamic time warping (DTW) less suited. There is recent work on center-based clustering of trajectories with a continuous distance measure, namely, the Fréchet distance. While the Fréchet distance allows for restriction of the center complexity, it can also be sensitive to outliers, whereas averaging-type distance measures, like DTW, are less so. To obtain a trajectory clustering algorithm that allows restricting center complexity and is more robust to outliers, we propose the usage of a continuous version of DTW as distance measure, which we call continuous dynamic time warping (CDTW). Our contribution is twofold: 1. To combat the lack of practical algorithms for CDTW, we develop an approximation algorithm that computes it. 2. We develop the first clustering algorithm under this distance measure and show a practical way to compute a center from a set of trajectories and subsequently iteratively improve it. To obtain insights into the results of clustering under CDTW on practical data, we conduct extensive experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题