使用模仿学习的视频编码的神经率控制

论文标题

使用模仿学习的视频编码的神经率控制

Neural Rate Control for Video Encoding using Imitation Learning

论文作者

Mao, Hongzi, Gu, Chenjie, Wang, Miaosen, Chen, Angie, Lazic, Nevena, Levine, Nir, Pang, Derek, Claus, Rene, Hechtman, Marisabel, Chiang, Ching-Han, Chen, Cheng, Han, Jingning

论文摘要

在现代视频编码器中，费率控制是一个关键的组件，并且已经进行了大量设计。它决定要花费多少位来编码每个帧，以优化所有视频帧的利率差异。这是一个具有挑战性的限制计划问题，因为对不同视频帧的决策之间的依赖性复杂，并且在情节结束时定义了比特率约束。我们将费率控制问题作为部分可观察到的马尔可夫决策过程（POMDP），并应用模仿学习来学习神经速率控制政策。我们证明，通过从最佳视频编码通过进化策略获得的轨迹中学习，我们学到的政策可以更好地编码效率，并且违反了最小的限制。除了模仿最佳行动外，我们还发现，额外的辅助损失，数据增强/改进和推理时间策略改进对于学习良好的利率控制政策至关重要。我们以两种通行变量比特率（VBR）模式在LIBVPX（广泛采用的开源VP9编解码器库）中评估了针对利率控制策略的策略。我们表明，在各种各样的现实视频中，我们学到的政策在不牺牲视频质量的情况下实现了8.5％的中位数降低。

In modern video encoders, rate control is a critical component and has been heavily engineered. It decides how many bits to spend to encode each frame, in order to optimize the rate-distortion trade-off over all video frames. This is a challenging constrained planning problem because of the complex dependency among decisions for different video frames and the bitrate constraint defined at the end of the episode. We formulate the rate control problem as a Partially Observable Markov Decision Process (POMDP), and apply imitation learning to learn a neural rate control policy. We demonstrate that by learning from optimal video encoding trajectories obtained through evolution strategies, our learned policy achieves better encoding efficiency and has minimal constraint violation. In addition to imitating the optimal actions, we find that additional auxiliary losses, data augmentation/refinement and inference-time policy improvements are critical for learning a good rate control policy. We evaluate the learned policy against the rate control policy in libvpx, a widely adopted open source VP9 codec library, in the two-pass variable bitrate (VBR) mode. We show that over a diverse set of real-world videos, our learned policy achieves 8.5% median bitrate reduction without sacrificing video quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题