自然语言模型理解的本地汇总特征归因

论文标题

自然语言模型理解的本地汇总特征归因

Locally Aggregated Feature Attribution on Natural Language Model Understanding

论文作者

Zhang, Sheng, Wang, Jin, Jiang, Haitao, Song, Rui

论文摘要

随着深度学习模型的日益普及，模型理解变得越来越重要。已经大力努力地揭开了深层神经网络的神秘化，以更好地解释性。某些特征归因方法在计算机视觉中显示出令人鼓舞的结果，尤其是基于梯度的方法，在这些方法中，用参考数据有效地平滑梯度是稳健而忠实的结果的关键。但是，由于输入由离散令牌组成，并且未明确定义这些基于梯度的方法在NLP任务中直接应用于NLP任务。在这项工作中，我们提出了本地汇总的特征归因（LAFA），这是一种基于NLP模型的新型基于梯度的特征归因方法。它不依赖晦涩的参考令牌，而是通过汇总从语言模型嵌入的类似参考文本来平滑渐变。为了进行评估目的，我们还针对不同的NLP任务进行了实验，包括公共数据集上的实体识别和情感分析，以及在构造的Amazon目录数据集上的关键功能检测。通过实验证明了所提出方法的出色性能。

With the growing popularity of deep-learning models, model understanding becomes more important. Much effort has been devoted to demystify deep neural networks for better interpretability. Some feature attribution methods have shown promising results in computer vision, especially the gradient-based methods where effectively smoothing the gradients with reference data is key to a robust and faithful result. However, direct application of these gradient-based methods to NLP tasks is not trivial due to the fact that the input consists of discrete tokens and the "reference" tokens are not explicitly defined. In this work, we propose Locally Aggregated Feature Attribution (LAFA), a novel gradient-based feature attribution method for NLP models. Instead of relying on obscure reference tokens, it smooths gradients by aggregating similar reference texts derived from language model embeddings. For evaluation purpose, we also design experiments on different NLP tasks including Entity Recognition and Sentiment Analysis on public datasets as well as key feature detection on a constructed Amazon catalogue dataset. The superior performance of the proposed method is demonstrated through experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题