论文标题
Yelp评论的情感分析:技术和模型的比较
Sentiment Analysis of Yelp Reviews: A Comparison of Techniques and Models
论文作者
论文摘要
我们在5,000餐厅对350,000多个Yelp评论进行了关于文本预处理技术的消融研究。我们还比较了几种机器学习和深度学习模型对预测用户情感(负,中性或积极)的有效性。对于机器学习模型,我们发现,使用二进制单词袋表示,添加Bi-gram,施加最小频率约束和标准化文本对模型性能具有积极影响。对于深度学习模型,我们发现使用预训练的单词嵌入和封盖最大长度通常会提高模型性能。最后,将宏F1分数作为我们的比较度量标准,我们发现诸如逻辑回归和支持向量机等更简单的模型比梯度增强,LSTM和BERT等更复杂的模型更有效地预测情感。
We use over 350,000 Yelp reviews on 5,000 restaurants to perform an ablation study on text preprocessing techniques. We also compare the effectiveness of several machine learning and deep learning models on predicting user sentiment (negative, neutral, or positive). For machine learning models, we find that using binary bag-of-word representation, adding bi-grams, imposing minimum frequency constraints and normalizing texts have positive effects on model performance. For deep learning models, we find that using pre-trained word embeddings and capping maximum length often boost model performance. Finally, using macro F1 score as our comparison metric, we find simpler models such as Logistic Regression and Support Vector Machine to be more effective at predicting sentiments than more complex models such as Gradient Boosting, LSTM and BERT.