视频传输的自我播放增强学习

论文标题

视频传输的自我播放增强学习

Self-play Reinforcement Learning for Video Transmission

论文作者

Huang, Tianchi, Zhang, Rui-Xiao, Sun, Lifeng

论文摘要

视频传输服务采用自适应算法来确保用户的需求。现有技术通常通过线性结合几个加权指标的函数进行优化和评估。然而，我们观察到给定功能无法准确描述需求。因此，这样的方法最终可能违反了原始需求。为了消除这种担忧，我们建议\ emph {zwei}，这是一种用于视频传输任务的自我播放增强学习算法。 ZWEI旨在通过直接利用实际要求来更新策略。从技术上讲，Zwei从相同的起点进行了许多轨迹，并立即估计了竞争结果W.R.T。在这里，竞争结果表示哪种轨迹更接近分配的要求。随后，Zwei通过最大化获胜率来优化策略。为了构建Zwei，我们开发模拟环境，设计足够的神经网络模型，并发明培训方法，以应对各种视频传输方案的不同要求。对两个代表性任务的痕量驱动分析表明，Zwei忠实地根据分配的要求优化自身，在所有被考虑的情况下都优于最先进的方法。

Video transmission services adopt adaptive algorithms to ensure users' demands. Existing techniques are often optimized and evaluated by a function that linearly combines several weighted metrics. Nevertheless, we observe that the given function fails to describe the requirement accurately. Thus, such proposed methods might eventually violate the original needs. To eliminate this concern, we propose \emph{Zwei}, a self-play reinforcement learning algorithm for video transmission tasks. Zwei aims to update the policy by straightforwardly utilizing the actual requirement. Technically, Zwei samples a number of trajectories from the same starting point and instantly estimates the win rate w.r.t the competition outcome. Here the competition result represents which trajectory is closer to the assigned requirement. Subsequently, Zwei optimizes the strategy by maximizing the win rate. To build Zwei, we develop simulation environments, design adequate neural network models, and invent training methods for dealing with different requirements on various video transmission scenarios. Trace-driven analysis over two representative tasks demonstrates that Zwei optimizes itself according to the assigned requirement faithfully, outperforming the state-of-the-art methods under all considered scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题