广义熵正规化或：标签平滑没有什么特别的

论文标题

广义熵正规化或：标签平滑没有什么特别的

Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing

论文作者

Meister, Clara, Salesky, Elizabeth, Cotterell, Ryan

论文摘要

先前的工作已直接探索了概率模型的输出分布，以减轻峰值（即过度自信）的预测，这是过度拟合的常见迹象。这类技术（其中标签平滑是一种）与熵正则化有连接。尽管在语言生成任务中跨体系结构和数据集的标签平滑成功始终取得了成功，但仍有两个问题仍然开放：（1）对熵正规化对模型的潜在效果几乎没有理解，并且（2）熵正规化技术的完整空间在很大程度上尚未开发。我们介绍了一个参数的熵正则家族，其中包括标签平滑为特殊情况，并利用它更好地了解模型的熵及其在语言生成任务上的性能之间的关系。我们还发现，模型性能的差异可以在很大程度上通过模型的熵来解释。最后，我们发现标签平滑不允许在输出分布中稀疏，语言生成模型的不良属性，因此建议使用其他熵正则方法。

Prior work has explored directly regularizing the output distributions of probabilistic models to alleviate peaky (i.e. over-confident) predictions, a common sign of overfitting. This class of techniques, of which label smoothing is one, has a connection to entropy regularization. Despite the consistent success of label smoothing across architectures and data sets in language generation tasks, two problems remain open: (1) there is little understanding of the underlying effects entropy regularizers have on models, and (2) the full space of entropy regularization techniques is largely unexplored. We introduce a parametric family of entropy regularizers, which includes label smoothing as a special case, and use it to gain a better understanding of the relationship between the entropy of a model and its performance on language generation tasks. We also find that variance in model performance can be explained largely by the resulting entropy of the model. Lastly, we find that label smoothing provably does not allow for sparsity in an output distribution, an undesirable property for language generation models, and therefore advise the use of other entropy regularization methods in its place.

下载PDF全文

下载文献需遵守相关版权规定

论文标题