实体跟踪的有效且可解释的神经模型

论文标题

实体跟踪的有效且可解释的神经模型

Efficient and Interpretable Neural Models for Entity Tracking

论文作者

Toshniwal, Shubham

论文摘要

一种自然语言模型才能理解小说，例如《指环王》？除其他外，此类模型必须能够：（a）识别和记录新字符（实体）及其在文本中引入的属性，以及（b）确定对先前引入并更新其属性的字符的后续引用。实体跟踪问题对于语言理解至关重要，因此，对于NLP中的各种下游应用程序，例如提问，摘要。在本文中，我们关注于促进实体跟踪模型使用的两个关键问题：（i）将实体跟踪模型缩放到长文档，例如小说，以及（ii）将实体跟踪集成到语言模型中。将语言技术应用于长文档，最近引起了兴趣，但是计算限制是扩大当前方法的重要瓶颈。在本论文中，我们认为可以通过代表具有较丰富的，固定维的矢量表示的实体来开发计算高效的实体跟踪模型，并通过审计的语言模型和利用实体的短暂性质来开发。我们还主张将实体跟踪集成到语言模型中，因为它将允许：（i）鉴于当前无处不在的NLP应用程序使用预算语言模型的应用程序，并且（ii）更容易采用，因为在新预处理的语言模型中交换要比集成单独的独立Entity Tracking模型要容易得多。

What would it take for a natural language model to understand a novel, such as The Lord of the Rings? Among other things, such a model must be able to: (a) identify and record new characters (entities) and their attributes as they are introduced in the text, and (b) identify subsequent references to the characters previously introduced and update their attributes. This problem of entity tracking is essential for language understanding, and thus, useful for a wide array of downstream applications in NLP such as question-answering, summarization. In this thesis, we focus on two key problems in relation to facilitating the use of entity tracking models: (i) scaling entity tracking models to long documents, such as a novel, and (ii) integrating entity tracking into language models. Applying language technologies to long documents has garnered interest recently, but computational constraints are a significant bottleneck in scaling up current methods. In this thesis, we argue that computationally efficient entity tracking models can be developed by representing entities with rich, fixed-dimensional vector representations derived from pretrained language models, and by exploiting the ephemeral nature of entities. We also argue for the integration of entity tracking into language models as it will allow for: (i) wider application given the current ubiquitous use of pretrained language models in NLP applications, and (ii) easier adoption since it is much easier to swap in a new pretrained language model than to integrate a separate standalone entity tracking model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题