论文标题
用于实时预测2019 - 2020年Covid-19爆发的机器学习方法,使用互联网搜索,新闻警报和机械模型的估算
A machine learning methodology for real-time forecasting of the 2019-2020 COVID-19 outbreak using Internet searches, news alerts, and estimates from mechanistic models
论文作者
论文摘要
我们提出了一种及时而新颖的方法,该方法结合了通过可解释的机器学习方法结合了机械模型和数字痕迹的疾病估计,以可靠地预测中国省份实时的Covid-19活动。具体而言,我们的方法能够提前2天产生稳定,准确的预测,并用作输入(a)中国中心疾病的官方健康报告(中国CDC),(b)COVID-19与BAIDU相关的Internet搜索活动,(c)媒体云和每日新闻媒体活动的新闻媒体活动,(d)Covid-19 Attrication的News Actions Antister otectiant of Covid-19 Antiment,Anter ant Gleam and Ant Ant Ant Anter,Ant ant Antiment。我们的机器学习方法使用一种聚类技术,可以利用中国各省的Covid-19活动的地理空间同步性,以及一种数据增强技术来处理少量的历史疾病活动观察,这是出现爆发的特征。我们的模型的预测能力优于32个中国省份中27个基线模型的集合,并且很容易扩展到当前受到Covid-19-19疫情影响的其他地理,以帮助决策者。
We present a timely and novel methodology that combines disease estimates from mechanistic models with digital traces, via interpretable machine-learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real-time. Specifically, our method is able to produce stable and accurate forecasts 2 days ahead of current time, and uses as inputs (a) official health reports from Chinese Center Disease for Control and Prevention (China CDC), (b) COVID-19-related internet search activity from Baidu, (c) news media activity reported by Media Cloud, and (d) daily forecasts of COVID-19 activity from GLEAM, an agent-based mechanistic model. Our machine-learning methodology uses a clustering technique that enables the exploitation of geo-spatial synchronicities of COVID-19 activity across Chinese provinces, and a data augmentation technique to deal with the small number of historical disease activity observations, characteristic of emerging outbreaks. Our model's predictive power outperforms a collection of baseline models in 27 out of the 32 Chinese provinces, and could be easily extended to other geographies currently affected by the COVID-19 outbreak to help decision makers.