贝叶斯注意模块

论文标题

贝叶斯注意模块

Bayesian Attention Modules

论文作者

Fan, Xinjie, Zhang, Shujian, Chen, Bo, Zhou, Mingyuan

论文摘要

注意模块作为简单有效的工具，不仅使深度神经网络能够在许多领域中获得最新的结果，还可以增强其解释性。大多数当前模型由于简单性和易于优化而使用确定性注意模块。另一方面，随机对应物具有潜在的好处，但它们的受益却很少。主要原因是随机关注通常会引入优化问题或需要重大的模型更改。在本文中，我们提出了易于实现和优化的可扩展随机版本。我们通过标准化可重新聚集的分布来构建单纯限制的注意力分布，从而使训练过程可区分。我们在贝叶斯框架中学习它们的参数，其中引入了正规化的数据依赖性先验。我们将提出的随机注意模块应用于各种基于注意力的模型，并应用于图形节点分类，视觉问答，图像字幕，机器翻译和语言理解。我们的实验表明，所提出的方法对相应的基准进行了一致的改进。

Attention modules, as simple and effective tools, have not only enabled deep neural networks to achieve state-of-the-art results in many domains, but also enhanced their interpretability. Most current models use deterministic attention modules due to their simplicity and ease of optimization. Stochastic counterparts, on the other hand, are less popular despite their potential benefits. The main reason is that stochastic attention often introduces optimization issues or requires significant model changes. In this paper, we propose a scalable stochastic version of attention that is easy to implement and optimize. We construct simplex-constrained attention distributions by normalizing reparameterizable distributions, making the training process differentiable. We learn their parameters in a Bayesian framework where a data-dependent prior is introduced for regularization. We apply the proposed stochastic attention modules to various attention-based models, with applications to graph node classification, visual question answering, image captioning, machine translation, and language understanding. Our experiments show the proposed method brings consistent improvements over the corresponding baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题