LSTMS构成（并学习）自下而上

论文标题

LSTMS构成（并学习）自下而上

LSTMs Compose (and Learn) Bottom-Up

论文作者

Saphra, Naomi, Lopez, Adam

论文摘要

NLP中的最新工作表明，LSTM语言模型在语言数据中捕获了层次结构。与现有工作相反，我们考虑了导致其组成行为的\ textit {Learning}过程。为了仔细研究LSTM的顺序表示如何组成，我们根据其栅极相互作用提出了LSTM中单词含义之间的分解相互依赖性（DI）的相关度量。我们通过对英语数据数据进行实验将此度量与语法联系起来，其中DI在句法距离较低的单词对上较高。为了探索导致这些组成表示在训练期间出现的电感偏见，我们对合成数据进行了简单的实验。这些综合实验支持了一个关于在培训过程中如何发现层次结构的特定假设：LSTM组成表示的自下而上，依靠其较短的孩子的有效表示，而不是独立于儿童学习较长的关系。

Recent work in NLP shows that LSTM language models capture hierarchical structure in language data. In contrast to existing work, we consider the \textit{learning} process that leads to their compositional behavior. For a closer look at how an LSTM's sequential representations are composed hierarchically, we present a related measure of Decompositional Interdependence (DI) between word meanings in an LSTM, based on their gate interactions. We connect this measure to syntax with experiments on English language data, where DI is higher on pairs of words with lower syntactic distance. To explore the inductive biases that cause these compositional representations to arise during training, we conduct simple experiments on synthetic data. These synthetic experiments support a specific hypothesis about how hierarchical structures are discovered over the course of training: that LSTM constituent representations are learned bottom-up, relying on effective representations of their shorter children, rather than learning the longer-range relations independently from children.

下载PDF全文

下载文献需遵守相关版权规定

论文标题