TOP-K分类精确的损失的精确折衷的损失

论文标题

TOP-K分类精确的损失的精确折衷的损失

Trade-offs in Top-k Classification Accuracies on Losses for Deep Learning

论文作者

Sawada, Azusa, Kaneko, Eiji, Sagi, Kazutoshi

论文摘要

本文介绍了有关Top-K分类精度的权衡的实验分析。不保证没有无限训练数据和模型复杂性的情况下，不保证通常可以优化TOP-K预测。目的是阐明CE何时牺牲TOP-K精度以优化TOP-1预测，并设计损失，从而在这种情况下提高了TOP-K的精度。我们的新颖损失基本上是通过将时间上的TOP-K类分组为单个类来修改的。为了获得强大的决策边界，我们引入了从正常CE到我们的损失的自适应过渡，因此称其为TOP-K过渡损失。证明CE并不总是在我们的实验中学习TOP-K预测的最佳选择。首先，我们探索合成数据集上TOP-1和TOP-K（= 2）精确度之间的权衡，并在给定模型具有复杂的数据分布以表示最佳TOP-1预测时，在优化TOP-K预测时发现CE的失败。其次，我们比较了CIFAR-100数据集的TOP-K精度，该数据集的目标是深度学习中的前5个预测。尽管CE在TOP-1的准确性方面表现最好，但在TOP-5精确度中，我们的损失表现要比CE表现更好，除了使用一个实验设置。此外，已经发现我们的损失可提供更好的TOP-K精度，而k大于10的CE。结果，接受我们损失训练的RESNET18型号具有99％的精度，k = 25候选人，这比CE的候选人数较小，而CE的候选人数较小。

This paper presents an experimental analysis about trade-offs in top-k classification accuracies on losses for deep leaning and proposal of a novel top-k loss. Commonly-used cross entropy (CE) is not guaranteed to optimize top-k prediction without infinite training data and model complexities. The objective is to clarify when CE sacrifices top-k accuracies to optimize top-1 prediction, and to design loss that improve top-k accuracy under such conditions. Our novel loss is basically CE modified by grouping temporal top-k classes as a single class. To obtain a robust decision boundary, we introduce an adaptive transition from normal CE to our loss, and thus call it top-k transition loss. It is demonstrated that CE is not always the best choice to learn top-k prediction in our experiments. First, we explore trade-offs between top-1 and top-k (=2) accuracies on synthetic datasets, and find a failure of CE in optimizing top-k prediction when we have complex data distribution for a given model to represent optimal top-1 prediction. Second, we compare top-k accuracies on CIFAR-100 dataset targeting top-5 prediction in deep learning. While CE performs the best in top-1 accuracy, in top-5 accuracy our loss performs better than CE except using one experimental setup. Moreover, our loss has been found to provide better top-k accuracies compared to CE at k larger than 10. As a result, a ResNet18 model trained with our loss reaches 99 % accuracy with k=25 candidates, which is a smaller candidate number than that of CE by 8.

下载PDF全文

下载文献需遵守相关版权规定

论文标题