论文标题
冠状病毒大流行期间Twitter消息的单词频率和情感分析
Word frequency and sentiment analysis of twitter messages during Coronavirus pandemic
论文作者
论文摘要
Covid-19-19对社交媒体的对话产生了很大的影响,尤其是在Twitter等网站上,该网站已成为公众反应和信息共享的枢纽。本文从2020年1月开始分析与该疾病有关的大量Twitter消息数据集进行了处理。使用了两种方法:对单词频率的统计分析和对用户态度的情感分析进行的统计分析。单词频率是使用umigram,bigram和Trigrams建模的,其功率定律分布是拟合模型。该模型的有效性通过指标(例如平方误差总和(SSE),R平方($ r^2 $)和均方根误差(RMSE))确认。高$ r^2 $和低SSE/RMSE值表示适合模型。进行情感分析是为了了解Twitter用户消息的一般情感基调。结果表明,大多数推文表现出中性情感极性,只有2.57 \%表达负极性。
The COVID-19 epidemic has had a great impact on social media conversation, especially on sites like Twitter, which has emerged as a hub for public reaction and information sharing. This paper deals by analyzing a vast dataset of Twitter messages related to this disease, starting from January 2020. Two approaches were used: a statistical analysis of word frequencies and a sentiment analysis to gauge user attitudes. Word frequencies are modeled using unigrams, bigrams, and trigrams, with power law distribution as the fitting model. The validity of the model is confirmed through metrics like Sum of Squared Errors (SSE), R-squared ($R^2$), and Root Mean Squared Error (RMSE). High $R^2$ and low SSE/RMSE values indicate a good fit for the model. Sentiment analysis is conducted to understand the general emotional tone of Twitter users messages. The results reveal that a majority of tweets exhibit neutral sentiment polarity, with only 2.57\% expressing negative polarity.