论文标题
Infostop:多用户迁移率数据中的可扩展停止检测
Infostop: Scalable stop-location detection in multi-user mobility data
论文作者
论文摘要
近年来,以数据为驱动性的数据驱动性研究已经蓬勃发展,为现实世界中的挑战提供了解决方案,包括预测流行病和计划运输。通过计算工具来促进这些进步,从而可以分析数字轨迹的大规模数据集。预处理空间轨迹的挑战之一是所谓的停止位置检测,这需要将原始时间序列减少到个人是静止的目的地的序列。 Hariharan和Toyama(2004)提出了最广泛采用的解决方案,涉及过滤非平稳测量,然后在固定点上应用团聚聚类。但是,这种最先进的解决方案有两个局限性:(i)经常访问的位置非常近的位置(例如相邻建筑物)可能会合并为唯一位置,因为固有的测量噪声(ii)不能同时分析多个用户的痕迹,因此,无法共享目的地的定义。在本文中,我们描述了通过利用基于流动的网络社区检测算法Infomap来克服最新解决方案的局限性的InfoStop算法。我们针对$ \ sim 1000美元的人群测试Infostop,具有高度重叠的移动性。我们表明,InfoStop检测到的位置的大小饱和,增加了用户数量,而时间复杂性的增长速度慢于以前的解决方案。我们证明,Infostop可用于轻松推断社交会议。最后,我们提供了用Python和C ++编写的InfoStop的开源实现,该实现具有简单的API,并且可以用于标记时间订购的坐标坐标序列(GPS或其他方式),以及一组无序的空间点。
Data-driven research in mobility has prospered in recent years, providing solutions to real-world challenges including forecasting epidemics and planning transportation. These advancements were facilitated by computational tools enabling the analysis of large-scale data-sets of digital traces. One of the challenges when pre-processing spatial trajectories is the so-called stop location detection, that entails the reduction of raw time series to sequences of destinations where an individual was stationary. The most widely adopted solution to this problem was proposed by Hariharan and Toyama (2004) and involves filtering out non-stationary measurements, then applying agglomerative clustering on the stationary points. This state-of-the-art solution, however, suffers of two limitations: (i) frequently visited places located very close (such as adjacent buildings) are likely to be merged into a unique location, due to inherent measurement noise, (ii) traces for multiple users can not be analysed simultaneously, thus the definition of destination is not shared across users. In this paper, we describe the Infostop algorithm that overcomes the limitations of the state-of-the-art solution by leveraging the flow-based network community detection algorithm Infomap. We test Infostop for a population of $\sim 1000$ individuals with highly overlapping mobility. We show that the size of locations detected by Infostop saturates for increasing number of users and that time complexity grows slower than for previous solutions. We demonstrate that Infostop can be used to easily infer social meetings. Finally, we provide an open-source implementation of Infostop, written in Python and C++, that has a simple API and can be used both for labeling time-ordered coordinate sequences (GPS or otherwise), and unordered sets of spatial points.