MLE引导的参数搜索神经序列建模中的任务损失最小化

论文标题

MLE引导的参数搜索神经序列建模中的任务损失最小化

MLE-guided parameter search for task loss minimization in neural sequence modeling

论文作者

Welleck, Sean, Cho, Kyunghyun

论文摘要

神经自回旋序列模型用于在多种自然语言处理（NLP）任务中生成序列，其中根据序列级别的任务损失对其进行评估。这些模型通常经过最大似然估计的训练，忽略了任务损失，但在经验上表现良好。直接优化任务损失（例如策略梯度和最小风险训练）的典型方法基于序列空间中的采样，以获取基于单个序列的损失而评分的候选更新说明。在本文中，我们基于参数空间中的随机搜索开发了一种替代方法，该方法利用了最大似然梯度的访问。我们提出了最大似然引导参数搜索（MGS），该参数搜索是从更新方向的分布中采样的，这是围绕当前参数和最大似然梯度的随机搜索的混合物，每个方向都通过其在任务损失中的改善而加权。 MGS将采样转移到参数空间，并使用从多个序列合并的损失来评分候选者。我们的实验表明，MGS能够优化序列级别的损失，序列完成的重复和不终止的大幅降低，并且对机器翻译中最小风险训练的序列进行了类似的改进。

Neural autoregressive sequence models are used to generate sequences in a variety of natural language processing (NLP) tasks, where they are evaluated according to sequence-level task losses. These models are typically trained with maximum likelihood estimation, which ignores the task loss, yet empirically performs well as a surrogate objective. Typical approaches to directly optimizing the task loss such as policy gradient and minimum risk training are based around sampling in the sequence space to obtain candidate update directions that are scored based on the loss of a single sequence. In this paper, we develop an alternative method based on random search in the parameter space that leverages access to the maximum likelihood gradient. We propose maximum likelihood guided parameter search (MGS), which samples from a distribution over update directions that is a mixture of random search around the current parameters and around the maximum likelihood gradient, with each direction weighted by its improvement in the task loss. MGS shifts sampling to the parameter space, and scores candidates using losses that are pooled from multiple sequences. Our experiments show that MGS is capable of optimizing sequence-level losses, with substantial reductions in repetition and non-termination in sequence completion, and similar improvements to those of minimum risk training in machine translation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题