动态分层模仿朝着一致的优化目标

论文标题

动态分层模仿朝着一致的优化目标

Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives

论文作者

Li, Duo, Chen, Qifeng

论文摘要

尽管现代卷积神经网络（CNN）的深度超过了开创性网络的差距，但仅在最终分类器上附加监督并逐步传播上游梯度流的传统方式仍然是训练的支柱。提出了深入监督的网络（DSN），以减轻梯度流过长链引起的优化难度。但是，它仍然容易受到问题的影响，包括对分层表示生成过程的干扰和优化目标不一致，如本文所示。与以前的培训策略相辅相成，我们提出了动态分层模仿（一种通用的特征学习机制），以增强概括能力来推进CNN培训。受DSN的一部分启发，我们从给定神经网络的中间层进行了精心设计的侧面分支。每个分支都可以动态地从主分支的某些位置出现，该分支不仅保留植根于骨干网络的表示形式，而且还会沿其自己的路径产生更多样化的表示形式。我们迈出一步，通过具有概率预测匹配损失的优化公式来促进不同分支之间的多层次相互作用，从而确保了更强大的优化过程和更好的表示能力。在类别和实例识别任务上进行的实验证明了我们提出的方法对其相应的方法的实质性改进，使用了各种最先进的CNN体系结构。代码和模型可在https://github.com/d-li14/dhm上公开获取

While the depth of modern Convolutional Neural Networks (CNNs) surpasses that of the pioneering networks with a significant margin, the traditional way of appending supervision only over the final classifier and progressively propagating gradient flow upstream remains the training mainstay. Seminal Deeply-Supervised Networks (DSN) were proposed to alleviate the difficulty of optimization arising from gradient flow through a long chain. However, it is still vulnerable to issues including interference to the hierarchical representation generation process and inconsistent optimization objectives, as illustrated theoretically and empirically in this paper. Complementary to previous training strategies, we propose Dynamic Hierarchical Mimicking, a generic feature learning mechanism, to advance CNN training with enhanced generalization ability. Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network. Each branch can emerge from certain locations of the main branch dynamically, which not only retains representation rooted in the backbone network but also generates more diverse representations along its own pathway. We go one step further to promote multi-level interactions among different branches through an optimization formula with probabilistic prediction matching losses, thus guaranteeing a more robust optimization process and better representation ability. Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method over its corresponding counterparts using diverse state-of-the-art CNN architectures. Code and models are publicly available at https://github.com/d-li14/DHM

下载PDF全文

下载文献需遵守相关版权规定

论文标题