达到零训练错误后，我们是否需要零训练损失？

论文标题

达到零训练错误后，我们是否需要零训练损失？

Do We Need Zero Training Loss After Achieving Zero Training Error?

论文作者

Ishida, Takashi, Yamane, Ikko, Sakai, Tomoya, Niu, Gang, Sugiyama, Masashi

论文摘要

过度参数深度网络具有用零\ emph {训练错误}记住培训数据的能力。即使在记忆之后，\ emph {训练损失}仍继续接近零，使模型过度自信并降低了测试性能。由于现有的正规机构并非直接避免零训练损失，因此很难调整其超参数以维持固定/预设的培训损失水平。我们提出了一个名为\ emph {洪水}的直接解决方案，该解决方案在达到相当小的价值时有意防止训练损失进一步减少，我们称之为\ emph {洪水级}。如果训练损失低于洪水水平，我们的方法通过像往常一样的小批次下降但梯度上升来使损失浮在洪水水平上。这可以用一行代码来实现，并且与任何随机优化器和其他正规机构都兼容。随着洪水的影响，该模型将继续“随机步行”，并以相同的非零训练损失，我们希望它会逐渐进入具有统一损失景观的区域，从而导致更好的概括。我们在实验上表明，洪水改善了性能，并且作为副产品，可以引起测试损失的双重下降曲线。

Overparameterized deep networks have the capacity to memorize training data with zero \emph{training error}. Even after memorization, the \emph{training loss} continues to approach zero, making the model overconfident and the test performance degraded. Since existing regularizers do not directly aim to avoid zero training loss, it is hard to tune their hyperparameters in order to maintain a fixed/preset level of training loss. We propose a direct solution called \emph{flooding} that intentionally prevents further reduction of the training loss when it reaches a reasonably small value, which we call the \emph{flood level}. Our approach makes the loss float around the flood level by doing mini-batched gradient descent as usual but gradient ascent if the training loss is below the flood level. This can be implemented with one line of code and is compatible with any stochastic optimizer and other regularizers. With flooding, the model will continue to "random walk" with the same non-zero training loss, and we expect it to drift into an area with a flat loss landscape that leads to better generalization. We experimentally show that flooding improves performance and, as a byproduct, induces a double descent curve of the test loss.

下载PDF全文

下载文献需遵守相关版权规定

论文标题