关于经常性神经网络识别层次语言的实际能力

论文标题

关于经常性神经网络识别层次语言的实际能力

On the Practical Ability of Recurrent Neural Networks to Recognize Hierarchical Languages

论文作者

Bhattamishra, Satwik, Ahuja, Kabir, Goyal, Navin

论文摘要

尽管经常性模型在NLP任务中有效，但发现它们在无上下文语言（CFL）上的性能很弱。鉴于CFL被认为可以捕获重要的现象，例如自然语言中的等级结构，因此这种性能的差异需要解释。我们研究了反复模型对Dyck-N语言的性能，Dyck-n语言是这是一类特别重要的CFL类。我们发现，尽管训练和测试字符串的长度来自相同范围，但经常性模型几乎可以完美地概括，但如果测试字符串更长，则它们的性能很差。同时，我们观察到，如果其深度有界，则复发模型足够表达足以以有限的精度识别任意长度的戴克单词。因此，我们对具有有界深度的戴克语言产生的样本评估了模型，发现它们确实能够概括到更高的长度。由于自然语言数据集嵌套了有界深度的依赖关系，因此尽管事先作品表明对戴克语言的概括性能差，但它们在自然语言数据中的层次依赖性方面表现良好。我们进行探测研究以支持我们的结果并提供与变压器的比较。

While recurrent models have been effective in NLP tasks, their performance on context-free languages (CFLs) has been found to be quite weak. Given that CFLs are believed to capture important phenomena such as hierarchical structure in natural languages, this discrepancy in performance calls for an explanation. We study the performance of recurrent models on Dyck-n languages, a particularly important and well-studied class of CFLs. We find that while recurrent models generalize nearly perfectly if the lengths of the training and test strings are from the same range, they perform poorly if the test strings are longer. At the same time, we observe that recurrent models are expressive enough to recognize Dyck words of arbitrary lengths in finite precision if their depths are bounded. Hence, we evaluate our models on samples generated from Dyck languages with bounded depth and find that they are indeed able to generalize to much higher lengths. Since natural language datasets have nested dependencies of bounded depth, this may help explain why they perform well in modeling hierarchical dependencies in natural language data despite prior works indicating poor generalization performance on Dyck languages. We perform probing studies to support our results and provide comparisons with Transformers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题