论文标题
这些伯特:通过渐进模块替换压缩伯特
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
论文作者
论文摘要
在本文中,我们提出了一种新型的模型压缩方法,以通过更换进行性模块来有效地压缩BERT。我们的方法首先将原始的Bert分为几个模块,并建立其紧凑的替代品。然后,我们将原始模块随机替换为替代模块,以训练紧凑的模块以模仿原始模块的行为。我们通过培训逐渐增加了替换的可能性。通过这种方式,我们的方法在原始模型和紧凑的模型之间带来了更深的相互作用。与以前的知识蒸馏方法相比,我们的方法不会引入任何其他损失函数。我们的方法表现优于现有的知识蒸馏方法在胶水基准上,显示了模型压缩的新观点。
In this paper, we propose a novel model compression approach to effectively compress BERT by progressive module replacing. Our approach first divides the original BERT into several modules and builds their compact substitutes. Then, we randomly replace the original modules with their substitutes to train the compact modules to mimic the behavior of the original modules. We progressively increase the probability of replacement through the training. In this way, our approach brings a deeper level of interaction between the original and compact models. Compared to the previous knowledge distillation approaches for BERT compression, our approach does not introduce any additional loss function. Our approach outperforms existing knowledge distillation approaches on GLUE benchmark, showing a new perspective of model compression.