预测自然语言处理任务的性能

论文标题

预测自然语言处理任务的性能

Predicting Performance for Natural Language Processing Tasks

论文作者

Xia, Mengzhou, Anastasopoulos, Antonios, Xu, Ruochen, Yang, Yiming, Neubig, Graham

论文摘要

鉴于自然语言处理（NLP）研究中任务，语言和域的组合的复杂性，在每个可能的实验环境上详尽测试新提出的模型在计算上是毫无用处的。在这项工作中，我们试图探索对NLP模型在实验环境下的表现如何，而无需实际训练或测试模型的可能性。为此，我们构建回归模型，以预测实验设置为输入的NLP实验的评估评分。在实验9个不同的NLP任务时，我们发现我们的预测因子可以对看不见的语言和不同的建模架构产生有意义的预测，表现优于合理的基线以及人类专家。进一步，我们概述了如何使用预测变量来找到应运行的一小部分代表性实验，以便获得所有其他实验设置的合理预测。

Given the complexity of combinations of tasks, languages, and domains in natural language processing (NLP) research, it is computationally prohibitive to exhaustively test newly proposed models on each possible experimental setting. In this work, we attempt to explore the possibility of gaining plausible judgments of how well an NLP model can perform under an experimental setting, without actually training or testing the model. To do so, we build regression models to predict the evaluation score of an NLP experiment given the experimental settings as input. Experimenting on 9 different NLP tasks, we find that our predictors can produce meaningful predictions over unseen languages and different modeling architectures, outperforming reasonable baselines as well as human experts. Going further, we outline how our predictor can be used to find a small subset of representative experiments that should be run in order to obtain plausible predictions for all other experimental settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题