论文标题

正规文本逻辑回归:在线评论的关键字检测和情感分类

Regularised Text Logistic Regression: Key Word Detection and Sentiment Classification for Online Reviews

论文作者

Chen, Ying, Liu, Peng, Teo, Chung Piaw

论文摘要

在线客户评论对于款待和餐饮行业的经理和高管已经变得很重要,他们希望对客户的需求和期望有全面的了解。我们提出了一个正规的文本逻辑(RTL)回归模型,以对非结构化的文本数据进行文本分析和情感分类,该模型自动识别一组具有统计学意义且具有操作洞察力的单词特征,并实现令人满意的预测性分类精度。我们将RTL型号应用于TripAdvisor的两个在线评论数据集,餐厅和酒店。我们的结果表明,与替代分类器相比,分类性能令人满意,真正的正率最高94.9%。此外,RTL标识了一小部分单词功能,对应于3%的餐厅和20%的酒店,从而通过允许管理人员深入研究一组重要的客户评论来提高工作效率。我们还发展了估算器的一致性,稀疏性和甲骨文属性。

Online customer reviews have become important for managers and executives in the hospitality and catering industry who wish to obtain a comprehensive understanding of their customers' demands and expectations. We propose a Regularized Text Logistic (RTL) regression model to perform text analytics and sentiment classification on unstructured text data, which automatically identifies a set of statistically significant and operationally insightful word features, and achieves satisfactory predictive classification accuracy. We apply the RTL model to two online review datasets, Restaurant and Hotel, from TripAdvisor. Our results demonstrate satisfactory classification performance compared with alternative classifiers with a highest true positive rate of 94.9%. Moreover, RTL identifies a small set of word features, corresponding to 3% for Restaurant and 20% for Hotel, which boosts working efficiency by allowing managers to drill down into a much smaller set of important customer reviews. We also develop the consistency, sparsity and oracle property of the estimator.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源