论文标题
电影对话中的细粒度情绪和意图学习
Fine-grained Emotion and Intent Learning in Movie Dialogues
论文作者
论文摘要
我们提出了一个新颖的大规模情感对话数据集,该数据集由从OpenSubtitles语料库中检索的1M对话组成,并使用基于BERT的基于BERT的细粒度对话情感分类器进行注释。这项工作解释了用于预处理电影字幕的复杂管道,并选择良好的电影对话进行注释。我们还描述了半监督的学习过程,然后训练细粒度的情感分类器来注释这些对话。尽管有大量标签,但我们的对话情感分类器的准确度为65美元\%$,并用于注释OpenSubtitles的100万个情感电影对话。在数据集大小和细粒度的情绪和意图类别方面,这种情感对话分类的规模从未尝试过。用于分析所得数据集质量的可视化技术表明,它符合人类社会互动的模式。
We propose a novel large-scale emotional dialogue dataset, consisting of 1M dialogues retrieved from the OpenSubtitles corpus and annotated with 32 emotions and 9 empathetic response intents using a BERT-based fine-grained dialogue emotion classifier. This work explains the complex pipeline used to preprocess movie subtitles and select good movie dialogues to annotate. We also describe the semi-supervised learning process followed to train a fine-grained emotion classifier to annotate these dialogues. Despite the large set of labels, our dialogue emotion classifier achieved an accuracy of $65\%$ and was used to annotate 1M emotional movie dialogues from OpenSubtitles. This scale of emotional dialogue classification has never been attempted before, both in terms of dataset size and fine-grained emotion and intent categories. Visualization techniques used to analyze the quality of the resultant dataset suggest that it conforms to the patterns of human social interaction.