读者在goodreads.com上的角色和关系提取的自动管道和关系提取

论文标题

读者在goodreads.com上的角色和关系提取的自动管道和关系提取

An Automated Pipeline for Character and Relationship Extraction from Readers' Literary Book Reviews on Goodreads.com

论文作者

Shahsavari, Shadi, Ebrahimzadeh, Ehsan, Shahbazi, Behnam, Falahi, Misagh, Holur, Pavan, Bandari, Roja, Tangherlini, Timothy R., Roychowdhury, Vwani

论文摘要

读者在社交媒体上对文学小说的评论，尤其是那些持续，专门的论坛的小说，反过来又受到基本叙事框架的驱动。在他们对小说的评论中，读者通常只包含一部分角色及其关系，从而对该作品提供有限的观点。然而，这些评论总共捕获了一个基本的叙事框架，该框架由不同的演员（人，地方，事物），他们的角色以及我们标记为“共识叙事框架”的互动组成。我们以Actrations-Rivership Story图的形式代表此框架。提取此图是一个具有挑战性的计算问题，我们将其作为潜在图形模型估计问题。帖子和评论被视为隐藏叙事框架的子图/网络的样本。受Greimas的定性叙事理论的启发，我们制定了一个图形生成机器学习（ML）模型，其中节点代表Actants，节点之间的多边和自我浮动捕获了上下文特定的关系。我们开发了一个互锁的自动化方法的管道，以提取关键的Actant及其关系，并将其应用于GoodReads.com上发布的数千个评论和评论。我们从SparkNotes手动从SparkNotes中获取地面真理叙事框架，然后使用单词嵌入工具将地面真相网络中的关系与我们的提取网络进行比较。我们发现我们的自动化方法生成了高度准确的共识叙事框架：对于我们的四本目标小说，每本小说的评论约为2900本，我们报告了重要关系的平均覆盖范围/回忆> 80％，平均边缘检测率> 89 \％。这些提取的叙事框架可以洞悉人们（或类人）如何阅读以及他们如何叙述他们对他人阅读的内容。

Reader reviews of literary fiction on social media, especially those in persistent, dedicated forums, create and are in turn driven by underlying narrative frameworks. In their comments about a novel, readers generally include only a subset of characters and their relationships, thus offering a limited perspective on that work. Yet in aggregate, these reviews capture an underlying narrative framework comprised of different actants (people, places, things), their roles, and interactions that we label the "consensus narrative framework". We represent this framework in the form of an actant-relationship story graph. Extracting this graph is a challenging computational problem, which we pose as a latent graphical model estimation problem. Posts and reviews are viewed as samples of sub graphs/networks of the hidden narrative framework. Inspired by the qualitative narrative theory of Greimas, we formulate a graphical generative Machine Learning (ML) model where nodes represent actants, and multi-edges and self-loops among nodes capture context-specific relationships. We develop a pipeline of interlocking automated methods to extract key actants and their relationships, and apply it to thousands of reviews and comments posted on Goodreads.com. We manually derive the ground truth narrative framework from SparkNotes, and then use word embedding tools to compare relationships in ground truth networks with our extracted networks. We find that our automated methodology generates highly accurate consensus narrative frameworks: for our four target novels, with approximately 2900 reviews per novel, we report average coverage/recall of important relationships of > 80% and an average edge detection rate of >89\%. These extracted narrative frameworks can generate insight into how people (or classes of people) read and how they recount what they have read to others.

下载PDF全文

下载文献需遵守相关版权规定

论文标题