论文标题
全世界的集体关注如何引起大流行:Covid-19相关的N-gram时间序列在Twitter上的24种语言
How the world's collective attention is being paid to a pandemic: COVID-19 related n-gram time series for 24 languages on Twitter
论文作者
论文摘要
在面对冠状病毒病的全球传播时,我们必须进行协调的医疗,运营和政治反应。在所有努力中,数据至关重要。从根本上讲,在可能缺乏疫苗12到18个月的情况下,我们需要对疾病的存在进行普遍,有据可查的测试,并通过抗体的血清学测试确认恢复,我们需要跟踪主要的社会经济指标。但是,我们还需要各种辅助数据,包括与人口如何通过新闻和故事谈论发展大流行有关的数据。在社交媒体方面的一部分,我们在Twitter上策划了一组2000日规模的时间序列,其中24种语言是2020年4月在2019年4月的Twitter上,最重要的是“重要”。我们通过同种异体学仪器来确定重要性。我们对某些时间序列进行了一些基本观察,包括与随着时间的时间共同死亡的确认死亡人数进行比较。我们在所有语言上广泛地观察到了2020年1月的“病毒”特定语言词的顶峰,随后截至2月份下降,然后到3月和四月的激增。当病毒从中国传播时,世界的集体关注下降了。我们在GitLab上托管时间序列,每天都在相关时进行更新。我们的主要目的是让其他研究人员使用这些时间序列来增强大流行期间可能使用的任何分析以及回顾性研究。
In confronting the global spread of the coronavirus disease COVID-19 pandemic we must have coordinated medical, operational, and political responses. In all efforts, data is crucial. Fundamentally, and in the possible absence of a vaccine for 12 to 18 months, we need universal, well-documented testing for both the presence of the disease as well as confirmed recovery through serological tests for antibodies, and we need to track major socioeconomic indices. But we also need auxiliary data of all kinds, including data related to how populations are talking about the unfolding pandemic through news and stories. To in part help on the social media side, we curate a set of 2000 day-scale time series of 1- and 2-grams across 24 languages on Twitter that are most 'important' for April 2020 with respect to April 2019. We determine importance through our allotaxonometric instrument, rank-turbulence divergence. We make some basic observations about some of the time series, including a comparison to numbers of confirmed deaths due to COVID-19 over time. We broadly observe across all languages a peak for the language-specific word for 'virus' in January 2020 followed by a decline through February and then a surge through March and April. The world's collective attention dropped away while the virus spread out from China. We host the time series on Gitlab, updating them on a daily basis while relevant. Our main intent is for other researchers to use these time series to enhance whatever analyses that may be of use during the pandemic as well as for retrospective investigations.