全球协方差池中有什么深入的CNN受益：优化的观点

论文标题

全球协方差池中有什么深入的CNN受益：优化的观点

What Deep CNNs Benefit from Global Covariance Pooling: An Optimization Perspective

论文作者

Wang, Qilong, Zhang, Li, Wu, Banggu, Ren, Dongwei, Li, Peihua, Zuo, Wangmeng, Hu, Qinghua

论文摘要

最近的工作表明，全球协方差池（GCP）具有提高视觉分类任务上深卷积神经网络（CNN）的性能的能力。尽管有很大的进步，但GCP对Deep CNN的有效性的原因尚未得到很好的研究。在本文中，我们试图了解在优化的角度，从GCP中受益什么。具体而言，我们探讨了GCP对优化损失和梯度预测性的LIPSCHITNES的影响，并表明GCP可以使优化景观更加流畅，并且梯度更具预测性。此外，我们讨论了GCP与深入CNN的二阶优化之间的联系。更重要的是，以上发现可以说明培训以前未被认可或完全探索的培训深度CNN的几种优点，包括很大的加速网络融合加速（即接受GCP培训的网络可以支持快速衰减的学习速率，并降低了良好的效果，并降低了良好的训练范围，并且可以通过良好的范围来腐败，并且可以降低训练的范围，并具有良好的范围，），）具有不同视力任务的概括能力，例如对象检测和实例分割。我们使用各种深入的CNN模型进行多样化的任务进行广泛的实验，结果为我们的发现提供了强有力的支持。

Recent works have demonstrated that global covariance pooling (GCP) has the ability to improve performance of deep convolutional neural networks (CNNs) on visual classification task. Despite considerable advance, the reasons on effectiveness of GCP on deep CNNs have not been well studied. In this paper, we make an attempt to understand what deep CNNs benefit from GCP in a viewpoint of optimization. Specifically, we explore the effect of GCP on deep CNNs in terms of the Lipschitzness of optimization loss and the predictiveness of gradients, and show that GCP can make the optimization landscape more smooth and the gradients more predictive. Furthermore, we discuss the connection between GCP and second-order optimization for deep CNNs. More importantly, above findings can account for several merits of covariance pooling for training deep CNNs that have not been recognized previously or fully explored, including significant acceleration of network convergence (i.e., the networks trained with GCP can support rapid decay of learning rates, achieving favorable performance while significantly reducing number of training epochs), stronger robustness to distorted examples generated by image corruptions and perturbations, and good generalization ability to different vision tasks, e.g., object detection and instance segmentation. We conduct extensive experiments using various deep CNN models on diversified tasks, and the results provide strong support to our findings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题