自我发挥的网络如何识别Dyck-N语言？

论文标题

自我发挥的网络如何识别Dyck-N语言？

How Can Self-Attention Networks Recognize Dyck-n Languages?

论文作者

Ebrahimi, Javid, Gelda, Dhruv, Zhang, Wei

论文摘要

我们专注于具有自我注意力（SA）网络的Dyck-N（$ \ Mathcal {D} _n $）语言的识别，这被认为是这些网络的艰巨任务。我们比较了两个SA变体的性能，一个具有起始符号（SA $^+$），一个没有（SA $^ - $）。我们的结果表明，sa $^+$能够概括为更长的序列和更深的依赖项。对于$ \ Mathcal {D} _2 $，我们发现SA $^ - $完全分解了长序列，而SA $^+$的准确性为58.82 $ \％$。我们发现通过$ \ text {sa} {^+} $学到的注意力图可与解释相吻合，并与基于堆栈的语言识别器兼容。令人惊讶的是，SA网络的性能与LSTM相当，该网络提供了有关SA在不递归而学习层次结构的能力的证据。

We focus on the recognition of Dyck-n ($\mathcal{D}_n$) languages with self-attention (SA) networks, which has been deemed to be a difficult task for these networks. We compare the performance of two variants of SA, one with a starting symbol (SA$^+$) and one without (SA$^-$). Our results show that SA$^+$ is able to generalize to longer sequences and deeper dependencies. For $\mathcal{D}_2$, we find that SA$^-$ completely breaks down on long sequences whereas the accuracy of SA$^+$ is 58.82$\%$. We find attention maps learned by $\text{SA}{^+}$ to be amenable to interpretation and compatible with a stack-based language recognizer. Surprisingly, the performance of SA networks is at par with LSTMs, which provides evidence on the ability of SA to learn hierarchies without recursion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题