论文标题
一个个人级地面真相数据集用于家庭位置检测
An individual-level ground truth dataset for home location detection
论文作者
论文摘要
家庭检测是将电话设备分配到其家庭天线的,这是手机数据文献中大多数研究的无处不在的部分。尽管使用了广泛使用,但房屋检测仍取决于几个没有地面真相的假设,即拥有设备的个人所在的位置。在本文中,我们对一组六十五名参与者的家庭检测算法的准确性进行了前所未有的评估,我们知道他们的确切家庭住址以及可能为他们服务的天线。此外,我们不仅分析了详细记录(CDR),还分析了其他两个手机流:扩展详细记录(XDR,``数据''频道)和控制平面记录(CPRS,网络流)。这些数据流不仅在其时间粒度上有所不同,而且在数据生成机制方面也有所不同,例如,CDR纯粹是人类触发的,而CPR纯粹是机器触发的事件。最后,我们量化了每个流对每个流进行成功的家庭检测所需的数据量。我们发现,流和算法的选择严重影响了房屋检测,XDRS的一天小时算法表现最好,并且CPRS可以为执行家庭检测所需的数据量提供最佳的作用。我们的工作对于研究人员和从业人员很有用,以最大程度地减少数据请求并最大程度地提高家庭天线位置的准确性。
Home detection, assigning a phone device to its home antenna, is a ubiquitous part of most studies in the literature on mobile phone data. Despite its widespread use, home detection relies on a few assumptions that are difficult to check without ground truth, i.e., where the individual that owns the device resides. In this paper, we provide an unprecedented evaluation of the accuracy of home detection algorithms on a group of sixty-five participants for whom we know their exact home address and the antennas that might serve them. Besides, we analyze not only Call Detail Records (CDRs) but also two other mobile phone streams: eXtended Detail Records (XDRs, the ``data'' channel) and Control Plane Records (CPRs, the network stream). These data streams vary not only in their temporal granularity but also they differ in the data generation mechanism', e.g., CDRs are purely human-triggered while CPR is purely machine-triggered events. Finally, we quantify the amount of data that is needed for each stream to carry out successful home detection for each stream. We find that the choice of stream and the algorithm heavily influences home detection, with an hour-of-day algorithm for the XDRs performing the best, and with CPRs performing best for the amount of data needed to perform home detection. Our work is useful for researchers and practitioners in order to minimize data requests and to maximize the accuracy of home antenna location.