论文标题
自我发挥的网络如何识别Dyck-N语言?
How Can Self-Attention Networks Recognize Dyck-n Languages?
论文作者
论文摘要
我们专注于具有自我注意力(SA)网络的Dyck-N($ \ Mathcal {D} _n $)语言的识别,这被认为是这些网络的艰巨任务。我们比较了两个SA变体的性能,一个具有起始符号(SA $^+$),一个没有(SA $^ - $)。我们的结果表明,sa $^+$能够概括为更长的序列和更深的依赖项。对于$ \ Mathcal {D} _2 $,我们发现SA $^ - $完全分解了长序列,而SA $^+$的准确性为58.82 $ \%$。我们发现通过$ \ text {sa} {^+} $学到的注意力图可与解释相吻合,并与基于堆栈的语言识别器兼容。令人惊讶的是,SA网络的性能与LSTM相当,该网络提供了有关SA在不递归而学习层次结构的能力的证据。
We focus on the recognition of Dyck-n ($\mathcal{D}_n$) languages with self-attention (SA) networks, which has been deemed to be a difficult task for these networks. We compare the performance of two variants of SA, one with a starting symbol (SA$^+$) and one without (SA$^-$). Our results show that SA$^+$ is able to generalize to longer sequences and deeper dependencies. For $\mathcal{D}_2$, we find that SA$^-$ completely breaks down on long sequences whereas the accuracy of SA$^+$ is 58.82$\%$. We find attention maps learned by $\text{SA}{^+}$ to be amenable to interpretation and compatible with a stack-based language recognizer. Surprisingly, the performance of SA networks is at par with LSTMs, which provides evidence on the ability of SA to learn hierarchies without recursion.