论文标题

一个有监督的机器学习模型,用于归咎于智能卡数据中缺少登机的停靠站

A Supervised Machine Learning Model For Imputing Missing Boarding Stops In Smart Card Data

论文作者

Shalit, Nadav, Fire, Michael, Ben-Elia, Eran

论文摘要

随着人口密度和环境意识的提高,公共交通已成为城市存在的重要组成部分。目前生成了大量数据,从而可以通过收获智能卡使用来了解更多强大的方法来了解旅行行为。但是,公共交通数据集遇到了数据完整性问题;由于不完善的获取流程或报告不足,登机停止信息可能会丢失。我们开发了一种有监督的机器学习方法,以使用GTFS时间表,智能卡和地理空间数据集基于序数分类来估算缺失的登机站。建议一种新的帕累托准确性,以评估算法具有有序性质的算法。结果基于以色列啤酒舍瓦市的案例研究,该案例由一个月的智能卡数据组成。我们表明,我们提出的方法对不规则的旅行者具有鲁棒性,并且显着优于众所周知的插补方法,而无需开采任何其他数据集。使用转移学习对另一个以色列城市的数据验证表明,提出的模型是一般且无上下文的。进一步讨论了对运输计划和旅行行为研究的影响。

Public transport has become an essential part of urban existence with increased population densities and environmental awareness. Large quantities of data are currently generated, allowing for more robust methods to understand travel behavior by harvesting smart card usage. However, public transport datasets suffer from data integrity problems; boarding stop information may be missing due to imperfect acquirement processes or inadequate reporting. We developed a supervised machine learning method to impute missing boarding stops based on ordinal classification using GTFS timetable, smart card, and geospatial datasets. A new metric, Pareto Accuracy, is suggested to evaluate algorithms where classes have an ordinal nature. Results are based on a case study in the city of Beer Sheva, Israel, consisting of one month of smart card data. We show that our proposed method is robust to irregular travelers and significantly outperforms well-known imputation methods without the need to mine any additional datasets. Validation of data from another Israeli city using transfer learning shows the presented model is general and context-free. The implications for transportation planning and travel behavior research are further discussed.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源