论文标题
推文情感动态:来自我们和加拿大的推文中的情感单词用法
Tweet Emotion Dynamics: Emotion Word Usage in Tweets from US and Canada
论文作者
论文摘要
在过去的十年中,Twitter成为社会,政治和健康话语最有影响力的论坛之一。在本文中,我们介绍了一个大量的数据集,其中包括2015年至2021年之间从美国和加拿大(TUSC)发表的超过4500万个地理位置的推文,尤其是为自然语言分析而策划的。我们还引入了推文情感动态(TED) - 随着时间的推移,捕获与推文相关的情绪模式的指标。我们使用TED和TUSC探索在我们和加拿大之间与情绪相关的单词的使用;在2019年(流行前),2020年(大流行一年)和2021年(大流行的第二年);以及各个高音扬声器。我们表明,与美国推文相比,加拿大推文往往具有更高的价,唤醒和优势更高。此外,我们表明,与毗邻年份相比,COVID-19-19大流行对2020年发布的推文的情感签名产生了显着影响。最后,我们确定了170,000台高音扬声器的TED指标,以在总级别的基准测定指标中基准特征。 TUSC和TED的指标将在研究我们如何使用语言表达自己,说服,交流和影响力的方面进行大量研究,并在公共卫生,情感科学,社会科学和心理学方面特别有希望的应用。
Over the last decade, Twitter has emerged as one of the most influential forums for social, political, and health discourse. In this paper, we introduce a massive dataset of more than 45 million geo-located tweets posted between 2015 and 2021 from US and Canada (TUSC), especially curated for natural language analysis. We also introduce Tweet Emotion Dynamics (TED) -- metrics to capture patterns of emotions associated with tweets over time. We use TED and TUSC to explore the use of emotion-associated words across US and Canada; across 2019 (pre-pandemic), 2020 (the year the pandemic hit), and 2021 (the second year of the pandemic); and across individual tweeters. We show that Canadian tweets tend to have higher valence, lower arousal, and higher dominance than the US tweets. Further, we show that the COVID-19 pandemic had a marked impact on the emotional signature of tweets posted in 2020, when compared to the adjoining years. Finally, we determine metrics of TED for 170,000 tweeters to benchmark characteristics of TED metrics at an aggregate level. TUSC and the metrics for TED will enable a wide variety of research on studying how we use language to express ourselves, persuade, communicate, and influence, with particularly promising applications in public health, affective science, social science, and psychology.