论文标题
Chordmixer:具有不同长度序列的可扩展神经注意力模型
ChordMixer: A Scalable Neural Attention Model for Sequences with Different Lengths
论文作者
论文摘要
顺序数据自然在许多域中具有不同的长度,并且具有很长的序列。作为重要的建模工具,神经注意力应在此类序列中捕获远程相互作用。但是,大多数现有的神经注意力模型仅接受短序列,或者它们必须使用块或填充来执行恒定的输入长度。在这里,我们提出了一个称为Chordmixer的简单神经网络构建块,该块可以对具有可变长度的长序列进行建模。每个Chordmixer块都由一个无需学习参数的位置旋转层组成,并且由元素的MLP层组成。反复应用此类块形成了一个有效的网络主链,将输入信号与学习目标混合在一起。我们已经测试了Chordmixer关于合成的添加问题,长文档分类和基于DNA序列的分类法分类。实验结果表明,我们的方法基本上优于其他神经注意力模型。
Sequential data naturally have different lengths in many domains, with some very long sequences. As an important modeling tool, neural attention should capture long-range interaction in such sequences. However, most existing neural attention models admit only short sequences, or they have to employ chunking or padding to enforce a constant input length. Here we propose a simple neural network building block called ChordMixer which can model the attention for long sequences with variable lengths. Each ChordMixer block consists of a position-wise rotation layer without learnable parameters and an element-wise MLP layer. Repeatedly applying such blocks forms an effective network backbone that mixes the input signals towards the learning targets. We have tested ChordMixer on the synthetic adding problem, long document classification, and DNA sequence-based taxonomy classification. The experiment results show that our method substantially outperforms other neural attention models.