论文标题
电力Twitter帖子的情感分析
Sentiment analysis on electricity twitter posts
论文作者
论文摘要
在当今的世界中,每个人都以某种方式表现出来,而该项目的重点是人们使用Twitter的数据(一个微博平台)的数据,人们对英国和印度的电力价格上涨的看法,这是一个微博平台,人们在该平台上发布了消息,称为Tweets。由于许多人的收入不好,他们必须缴纳如此多的税款和账单,因此如今,维持房屋已成为有争议的问题。尽管政府提供了补贴计划来补偿人们的电费,但不受人们的欢迎。在这个项目中,目的是对Twitter上表达的人们的表情和观点进行情感分析。为了掌握电价的意见,有必要对能源市场的政府和消费者进行情感分析。此外,这些媒体上存在的文本本质上是非结构化的,因此要处理它们,我们首先需要预处理数据。有很多特征提取技术,例如单词袋,tf-idf(术语频率为单位文档频率),单词嵌入,基于NLP的功能,例如Word Count。在这个项目中,我们分析了特征TF-IDF单词级别对电力账单数据集的影响。我们发现,通过使用TF-IDF单词级别的表现,情感分析的表现比使用N-Gram功能高3-4。使用四种分类算法进行分析,包括天真的贝叶斯,决策树,随机森林和逻辑回归,并考虑F得分,准确性,精度和召回性能参数。
In today's world, everyone is expressive in some way, and the focus of this project is on people's opinions about rising electricity prices in United Kingdom and India using data from Twitter, a micro-blogging platform on which people post messages, known as tweets. Because many people's incomes are not good and they have to pay so many taxes and bills, maintaining a home has become a disputed issue these days. Despite the fact that Government offered subsidy schemes to compensate people electricity bills but it is not welcomed by people. In this project, the aim is to perform sentiment analysis on people's expressions and opinions expressed on Twitter. In order to grasp the electricity prices opinion, it is necessary to carry out sentiment analysis for the government and consumers in energy market. Furthermore, text present on these medias are unstructured in nature, so to process them we firstly need to pre-process the data. There are so many feature extraction techniques such as Bag of Words, TF-IDF (Term Frequency-Inverse Document Frequency), word embedding, NLP based features like word count. In this project, we analysed the impact of feature TF-IDF word level on electricity bills dataset of sentiment analysis. We found that by using TF-IDF word level performance of sentiment analysis is 3-4 higher than using N-gram features. Analysis is done using four classification algorithms including Naive Bayes, Decision Tree, Random Forest, and Logistic Regression and considering F-Score, Accuracy, Precision, and Recall performance parameters.