论文标题
通过不确定性了解神经抽象的摘要模型
Understanding Neural Abstractive Summarization Models via Uncertainty
论文作者
论文摘要
SEQ2SEQ抽象摘要模型的一个优点是它们以自由形式生成文本,但是这种灵活性使得很难解释模型行为。在这项工作中,我们通过研究模型的令牌级预测的熵或不确定性,以黑框和白盒方式分析汇总解码器。对于两个强大的预训练模型,即Pegasus和Bart在两个摘要数据集上,我们发现低预测熵与模型复制令牌而不是生成新文本之间存在很强的相关性。解码器的不确定性还与句子位置和相邻代币对之间的句法距离之类的因素相关,从而使什么因素理解了哪些因素使上下文对模型的下一个输出代币特别有选择性。最后,我们研究了解码器不确定性和注意力行为的关系,以了解注意力如何引起模型中观察到的效果。我们表明,不确定性是更广泛地分析摘要和文本生成模型的有用观点。
An advantage of seq2seq abstractive summarization models is that they generate text in a free-form manner, but this flexibility makes it difficult to interpret model behavior. In this work, we analyze summarization decoders in both blackbox and whitebox ways by studying on the entropy, or uncertainty, of the model's token-level predictions. For two strong pre-trained models, PEGASUS and BART on two summarization datasets, we find a strong correlation between low prediction entropy and where the model copies tokens rather than generating novel text. The decoder's uncertainty also connects to factors like sentence position and syntactic distance between adjacent pairs of tokens, giving a sense of what factors make a context particularly selective for the model's next output token. Finally, we study the relationship of decoder uncertainty and attention behavior to understand how attention gives rise to these observed effects in the model. We show that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly.