论文标题
通过知识蒸馏转移电感偏见
Transferring Inductive Biases through Knowledge Distillation
论文作者
论文摘要
在数据或计算资源是限制因素,或者训练数据并不能完全代表测试时条件的许多任务或情况下,具有正确的归纳偏见至关重要。但是,定义,设计和有效适应归纳偏见并不一定直接。在本文中,我们探讨了知识蒸馏的力量,将电感偏见的效果从一个模型转移到另一种模型。在任务和场景的背景下,我们考虑具有不同归纳偏见的模型家族,LSTMS与Transformers和CNN与MLP的家族,具有正确的归纳偏见至关重要。我们研究了归纳偏差对解决方案的影响。模型将归纳偏见的效果以及在何种程度上通过知识蒸馏传递到何种程度上,不仅是在性能的情况下,而且是融合解决方案的不同方面。
Having the right inductive biases can be crucial in many tasks or scenarios where data or computing resources are a limiting factor, or where training data is not perfectly representative of the conditions at test time. However, defining, designing and efficiently adapting inductive biases is not necessarily straightforward. In this paper, we explore the power of knowledge distillation for transferring the effect of inductive biases from one model to another. We consider families of models with different inductive biases, LSTMs vs. Transformers and CNNs vs. MLPs, in the context of tasks and scenarios where having the right inductive biases is critical. We study the effect of inductive biases on the solutions the models converge to and investigate how and to what extent the effect of inductive biases is transferred through knowledge distillation, in terms of not only performance but also different aspects of converged solutions.