低资源口语理解的瓶颈低级变压器

论文标题

低资源口语理解的瓶颈低级变压器

Bottleneck Low-rank Transformers for Low-resource Spoken Language Understanding

论文作者

Wang, Pu, Van hamme, Hugo

论文摘要

端到端的口语理解（SLU）系统受益于在大型语料库上进行预处理，然后对特定于应用程序的数据进行微调。最终的模型太大了，无法使用边缘应用。例如，基于BERT的系统包含超过1.1亿参数。观察模型过度参数化，我们提出了瘦变压器结构，其中使用组稀疏性自动降低了注意机制的维度。我们提出了一种变体，其中学习的注意子空间被转移到注意力瓶颈层。在低资源环境中，没有预先培训的情况下，由此产生的紧凑型SLU模型可与预训练的大型模型竞争精度。

End-to-end spoken language understanding (SLU) systems benefit from pretraining on large corpora, followed by fine-tuning on application-specific data. The resulting models are too large for on-edge applications. For instance, BERT-based systems contain over 110M parameters. Observing the model is overparameterized, we propose lean transformer structure where the dimension of the attention mechanism is automatically reduced using group sparsity. We propose a variant where the learned attention subspace is transferred to an attention bottleneck layer. In a low-resource setting and without pre-training, the resulting compact SLU model achieves accuracies competitive with pre-trained large models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题