论文标题
探索斯坦福问题上的BERT参数效率回答数据集v2.0
Exploring BERT Parameter Efficiency on the Stanford Question Answering Dataset v2.0
论文作者
论文摘要
在本文中,我们探讨了Bert Arxiv的参数效率:1810.04805在Stanford问题回答数据集(Squead2.0)的2.0版上。我们在冻结不同数量的最终变压器层的同时评估了BERT的参数效率,并包括ARXIV中提出的适配层:1902.00751。此外,我们尝试使用上下文感知的卷积(CACNN)过滤器,如ARXIV:1709.08294V3中所述,作为Squead2.0任务的最终增强层。 这种探索是由Arxiv:1907.10597的部分动机,该探索为扩大人工智能模型的评估标准提供了一个令人信服的案例,以包括各种资源效率衡量标准。尽管我们没有根据ARXIV:1907.10597提出的浮点操作效率来评估这些模型,但我们研究了训练时间,推理时间和模型参数总数的效率。我们的结果在很大程度上证实了Arxiv:1902.00751的适配器模块的结果,同时还表明,由于训练和推理时间的增加,F1得分从增加上下文感知的卷积过滤器中的提高是不切实际的。
In this paper we explore the parameter efficiency of BERT arXiv:1810.04805 on version 2.0 of the Stanford Question Answering dataset (SQuAD2.0). We evaluate the parameter efficiency of BERT while freezing a varying number of final transformer layers as well as including the adapter layers proposed in arXiv:1902.00751. Additionally, we experiment with the use of context-aware convolutional (CACNN) filters, as described in arXiv:1709.08294v3, as a final augmentation layer for the SQuAD2.0 tasks. This exploration is motivated in part by arXiv:1907.10597, which made a compelling case for broadening the evaluation criteria of artificial intelligence models to include various measures of resource efficiency. While we do not evaluate these models based on their floating point operation efficiency as proposed in arXiv:1907.10597, we examine efficiency with respect to training time, inference time, and total number of model parameters. Our results largely corroborate those of arXiv:1902.00751 for adapter modules, while also demonstrating that gains in F1 score from adding context-aware convolutional filters are not practical due to the increase in training and inference time.