论文标题
利用外语标记为基于方面意见挖掘的数据
Leveraging Foreign Language Labeled Data for Aspect-Based Opinion Mining
论文作者
论文摘要
基于方面的意见挖掘是在有见识的文本中识别情感的任务,该文本由两个子任务组成:方面类别提取和情感极性分类。尽管方面类别的提取旨在检测和分类诸如产品特征之类的意见目标,但情感极性分类将情感标签(即正面,负或中性或中性)分配给每个已确定的方面。已显示有监督的学习方法可以为这项任务提供更好的准确性,但是它们需要标记的数据,这是昂贵的,尤其是对于越南语等资源贫乏的语言。为了解决这个问题,我们提出了一种受监督的基于方面的意见挖掘方法,该方法利用了外语的标记数据(在这种情况下为英语),该方法通过自动翻译工具(Google Translate)转换为越南语。由于不同语言的方面和观点可以用不同的单词表示,因此我们建议使用单词嵌入除其他功能之外,以减少原始文本和翻译文本之间的词汇差异,从而提高方面类别提取和情感极性分类过程的有效性。我们还引入了从越南餐厅评论中提取的方面类别和情感极性的注释语料库,并在语料库上进行了一系列实验。实验结果证明了拟议方法的有效性。
Aspect-based opinion mining is the task of identifying sentiment at the aspect level in opinionated text, which consists of two subtasks: aspect category extraction and sentiment polarity classification. While aspect category extraction aims to detect and categorize opinion targets such as product features, sentiment polarity classification assigns a sentiment label, i.e. positive, negative, or neutral, to each identified aspect. Supervised learning methods have been shown to deliver better accuracy for this task but they require labeled data, which is costly to obtain, especially for resource-poor languages like Vietnamese. To address this problem, we present a supervised aspect-based opinion mining method that utilizes labeled data from a foreign language (English in this case), which is translated to Vietnamese by an automated translation tool (Google Translate). Because aspects and opinions in different languages may be expressed by different words, we propose using word embeddings, in addition to other features, to reduce the vocabulary difference between the original and translated texts, thus improving the effectiveness of aspect category extraction and sentiment polarity classification processes. We also introduce an annotated corpus of aspect categories and sentiment polarities extracted from restaurant reviews in Vietnamese, and conduct a series of experiments on the corpus. Experimental results demonstrate the effectiveness of the proposed approach.