从电子商务网站上的产品评论中开发一个组件提取器

论文标题

从电子商务网站上的产品评论中开发一个组件提取器

Developing a Component Comment Extractor from Product Reviews on E-Commerce Sites

论文作者

Anda, Shogo, Kikuchi, Masato, Ozono, Tadachika

论文摘要

消费者经常阅读产品评论以告知他们的购买决定，因为一些消费者想了解产品的特定组成部分。但是，由于产品评论上的典型句子包含各种详细信息，因此用户必须在许多评论中识别有关他们想了解的组件的句子。因此，我们旨在开发一个系统，以识别和收集句子中产品的组件和方面信息。我们基于BERT的分类器分配标签，涉及评论中的句子的组件和方面，并提取有关特定组件和方面的评论的句子。我们根据产品评论的模式匹配确定的单词确定了适当的标签，以创建培训数据。因为我们无法将这些单词用作标签，所以我们仔细创建了涵盖单词含义的标签。但是，培训数据在组件和方面对不平衡。我们使用WordNet引入了数据增强方法来减少偏差。我们的评估表明，该系统可以使用图案匹配来确定道路自行车的标签，涵盖了88％以上的电子商务网站上的组件和方面指标。此外，我们的数据增强方法可以从0.66到0.76提高F1的F1量度。

Consumers often read product reviews to inform their buying decision, as some consumers want to know a specific component of a product. However, because typical sentences on product reviews contain various details, users must identify sentences about components they want to know amongst the many reviews. Therefore, we aimed to develop a system that identifies and collects component and aspect information of products in sentences. Our BERT-based classifiers assign labels referring to components and aspects to sentences in reviews and extract sentences with comments on specific components and aspects. We determined proper labels based for the words identified through pattern matching from product reviews to create the training data. Because we could not use the words as labels, we carefully created labels covering the meanings of the words. However, the training data was imbalanced on component and aspect pairs. We introduced a data augmentation method using WordNet to reduce the bias. Our evaluation demonstrates that the system can determine labels for road bikes using pattern matching, covering more than 88\% of the indicators of components and aspects on e-commerce sites. Moreover, our data augmentation method can improve the-F1-measure on insufficient data from 0.66 to 0.76.

下载PDF全文

下载文献需遵守相关版权规定

论文标题