伯特尼西亚：调查伯特中知识的捕获和忘记

论文标题

伯特尼西亚：调查伯特中知识的捕获和忘记

BERTnesia: Investigating the capture and forgetting of knowledge in BERT

论文作者

Wallat, Jonas, Singh, Jaspreet, Anand, Avishek

论文摘要

探测复杂的语言模型最近揭示了对学习表示中发现的语言和语义模式的一些见解。在本文中，我们专门探究了Bert，以理解和衡量其捕获的关系知识。我们利用知识基础完成任务来探测预训练的每一层以及微调的BERT（排名，问答，NER）。我们的发现表明，知识不仅包含在伯特的最后一层中。中间层为所发现的总知识贡献了很大的贡献（17-60％）。探测中间层还揭示了不同类型的知识如何以不同的速率出现。当伯特经过微调时，关系知识就会忘记，但是遗忘的程度受到微调目标的影响，而不是数据集的大小。我们发现，排名模型忘记了最少的，并保留更多的知识。我们在GitHub上发布代码以重复实验。

Probing complex language models has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this paper, we probe BERT specifically to understand and measure the relational knowledge it captures. We utilize knowledge base completion tasks to probe every layer of pre-trained as well as fine-tuned BERT (ranking, question answering, NER). Our findings show that knowledge is not just contained in BERT's final layers. Intermediate layers contribute a significant amount (17-60%) to the total knowledge found. Probing intermediate layers also reveals how different types of knowledge emerge at varying rates. When BERT is fine-tuned, relational knowledge is forgotten but the extent of forgetting is impacted by the fine-tuning objective but not the size of the dataset. We found that ranking models forget the least and retain more knowledge in their final layer. We release our code on github to repeat the experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题