对多语言预训练模型进行探测结构化修剪：设置，算法和效率

论文标题

对多语言预训练模型进行探测结构化修剪：设置，算法和效率

Probing Structured Pruning on Multilingual Pre-trained Models: Settings, Algorithms, and Efficiency

论文作者

Li, Yanyang, Luo, Fuli, Xu, Runxin, Huang, Songfang, Huang, Fei, Wang, Liwei

论文摘要

结构化的修剪已在单语训练的语言模型上进行了广泛的研究，尚未在其多语言对应物上进行全面评估。这项工作研究了在多语言预训练的语言模型上结构化修剪的三个方面：设置，算法和效率。在九个下游任务上进行的实验显示了几种违反直觉现象：对于设置，每种语言单独修剪并不会引起更好的结果；对于算法，最简单的方法执行最好的方法。为了效率，快速模型并不意味着它也很小。为了促进对所有稀疏度的比较，我们提出了动态的稀疏性，这是一种简单的方法，允许一次训练模型并适应推理时的不同模型大小。我们希望这项工作填补了对多语种预训练模型的结构修剪研究的差距，并阐明了未来的研究。

Structured pruning has been extensively studied on monolingual pre-trained language models and is yet to be fully evaluated on their multilingual counterparts. This work investigates three aspects of structured pruning on multilingual pre-trained language models: settings, algorithms, and efficiency. Experiments on nine downstream tasks show several counter-intuitive phenomena: for settings, individually pruning for each language does not induce a better result; for algorithms, the simplest method performs the best; for efficiency, a fast model does not imply that it is also small. To facilitate the comparison on all sparsity levels, we present Dynamic Sparsification, a simple approach that allows training the model once and adapting to different model sizes at inference. We hope this work fills the gap in the study of structured pruning on multilingual pre-trained models and sheds light on future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题