论文标题
刀:从自由文本理由蒸馏推理知识
KNIFE: Distilling Reasoning Knowledge From Free-Text Rationales
论文作者
论文摘要
语言模型(LMS)在许多语言推理任务上取得了令人印象深刻的结果,但是他们意外的错误引起了对其推理能力的怀疑。鉴于此,对通过任务实例及其相关的自由文本理由(FTRS)进行填充/提示LMS的兴趣越来越大,这解释了预测正确的任务输出的正确推理过程(即如何正确地出于正确的理由”)。但是,现有的填充方法无法提高LM的性能,同时促使需求非常大(即> 50b)LMS的运作良好。我们提出了刀具,这表明推理知识可以从FTR中有效蒸馏成小(即<1b)LM并改善LM的性能。首先,刀芬芬一个老师LM(给定的任务输入和FTR)预测任务输出,将推理知识从FTR转移到教师的隐藏状态。其次,刀具芬特列(FineTunes)是学生LM(仅给定任务输入),以使其隐藏状态与老师的状态保持一致。因此,学生赋予了推理知识,但可以无需直接FTR输入而用于推断。在两个提问的数据集中,刀具的表现优于各种填充,并在完全监督和低资源的设置中提示基线。另外,我们观察到FTR质量对于刀的性能至关重要。
Language models (LMs) have yielded impressive results on many language reasoning tasks, but their unexpected errors raise doubts about their reasoning abilities. In light of this, there is growing interest in finetuning/prompting LMs with both task instances and their associated free-text rationales (FTRs), which explain the correct reasoning process for predicting the correct task output (i.e., how to be "right for the right reasons"). However, existing finetuning methods fail to improve LM performance, while prompting needs prohibitively large (i.e., >50B) LMs to work well. We propose KNIFE, which shows that reasoning knowledge can be effectively distilled from FTRs into a small (i.e., <1B) LM and improve the LM's performance. First, KNIFE finetunes a teacher LM (given task input and FTR) to predict the task output, transferring reasoning knowledge from the FTRs to the teacher's hidden states. Second, KNIFE finetunes a student LM (given task input only) such that its hidden states are aligned with the teacher's. Thus, the student is endowed with reasoning knowledge but can be used for inference without direct FTR input. On two question-answering datasets, KNIFE outperforms various finetuning and prompting baselines in fully-supervised and low-resource settings. Also, we observe that FTR quality is crucial to KNIFE's performance.