论文标题
一种无监督的情感分析的差异方法
A Variational Approach to Unsupervised Sentiment Analysis
论文作者
论文摘要
在本文中,我们提出了一种无监督情感分析的各种方法。我们不使用域专家提供的地面真理,而是使用目标开局对作为监督信号。例如,在文档片段中,“房间很大”(房间,大)是一个目标词对。这些单词对可以使用依赖解析器和简单规则来提取。我们的目标功能是预测一个舆论词给定目标词,而我们的最终目标是学习情感分类器。通过将潜在变量(即情感极性)引入目标函数,我们可以通过证据下限将情感分类器注入目标函数。我们可以通过优化下限来学习情感分类器。我们还对意见单词施加了复杂的约束,作为正规化,这鼓励如果两个文档具有相似(不同的)意见单词,则情感分类器应产生相似的(不同的)概率分布。我们将我们的方法应用于客户评论和临床叙述的情感分析。实验结果表明,我们的方法可以在两个领域的情感分析任务中胜过无监督的基线,而我们的方法在客户评论域中获得了可比较的结果,其结果可比有数百个标签的监督方法,并获得了与临床叙述领域的监督方法相当的结果。
In this paper, we propose a variational approach to unsupervised sentiment analysis. Instead of using ground truth provided by domain experts, we use target-opinion word pairs as a supervision signal. For example, in a document snippet "the room is big," (room, big) is a target-opinion word pair. These word pairs can be extracted by using dependency parsers and simple rules. Our objective function is to predict an opinion word given a target word while our ultimate goal is to learn a sentiment classifier. By introducing a latent variable, i.e., the sentiment polarity, to the objective function, we can inject the sentiment classifier to the objective function via the evidence lower bound. We can learn a sentiment classifier by optimizing the lower bound. We also impose sophisticated constraints on opinion words as regularization which encourages that if two documents have similar (dissimilar) opinion words, the sentiment classifiers should produce similar (different) probability distribution. We apply our method to sentiment analysis on customer reviews and clinical narratives. The experiment results show our method can outperform unsupervised baselines in sentiment analysis task on both domains, and our method obtains comparable results to the supervised method with hundreds of labels per aspect in customer reviews domain, and obtains comparable results to supervised methods in clinical narratives domain.