加速视频超分辨率模型的培训

论文标题

加速视频超分辨率模型的培训

Accelerating the Training of Video Super-Resolution Models

论文作者

Lin, Lijian, Wang, Xintao, Qi, Zhongang, Shan, Ying

论文摘要

尽管卷积神经网络（CNN）最近证明了视频超分辨率（VSR）的高质量重建，但有效训练竞争性VSR模型仍然是一个具有挑战性的问题。通常，它比训练其对应图像模型的时间更大的时间级，从而导致长期的研究周期。现有的VSR方法通常从头到尾训练具有固定的空间和时间大小的模型。固定尺寸通常设置为较大的值，以良好的性能，从而导致缓慢的训练。但是，VSR是否需要这种严格的培训策略？在这项工作中，我们表明可以逐步训练视频模型从小型到大的空间/时间尺寸，即以易于匹配的方式进行训练。特别是，整个训练分为几个阶段，较早的阶段具有较小的训练空间形状。在每个阶段，时间尺寸也从短到长而不等，而空间尺寸保持不变。训练是通过这种多机训练策略加速的，因为大多数计算都是在较小的空间和较短的时间形状上执行的。为了进一步加速GPU并行化，我们还研究了较大的Minibatch训练而不会损失准确性。广泛的实验表明，我们的方法能够在很大程度上加速培训（售价高达$ 6.2 \ times $ speedup in Wall-Clock训练时间），而无需用于各种VSR模型的性能下降。该代码可在https://github.com/tencentarc/felficed-vsr-training上找到。

Despite that convolution neural networks (CNN) have recently demonstrated high-quality reconstruction for video super-resolution (VSR), efficiently training competitive VSR models remains a challenging problem. It usually takes an order of magnitude more time than training their counterpart image models, leading to long research cycles. Existing VSR methods typically train models with fixed spatial and temporal sizes from beginning to end. The fixed sizes are usually set to large values for good performance, resulting to slow training. However, is such a rigid training strategy necessary for VSR? In this work, we show that it is possible to gradually train video models from small to large spatial/temporal sizes, i.e., in an easy-to-hard manner. In particular, the whole training is divided into several stages and the earlier stage has smaller training spatial shape. Inside each stage, the temporal size also varies from short to long while the spatial size remains unchanged. Training is accelerated by such a multigrid training strategy, as most of computation is performed on smaller spatial and shorter temporal shapes. For further acceleration with GPU parallelization, we also investigate the large minibatch training without the loss in accuracy. Extensive experiments demonstrate that our method is capable of largely speeding up training (up to $6.2\times$ speedup in wall-clock training time) without performance drop for various VSR models. The code is available at https://github.com/TencentARC/Efficient-VSR-Training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题