学习名：网络实用程序最大化具有未知的实用程序功能和排队延迟

论文标题

学习名：网络实用程序最大化具有未知的实用程序功能和排队延迟

Learning-NUM: Network Utility Maximization with Unknown Utility Functions and Queueing Delay

论文作者

Fu, Xinzhe, Modiano, Eytan

论文摘要

网络实用程序最大化（NUM）研究将流量分配给网络用户的问题，以最大程度地利用通过网络资源约束的用户的总效用。在本文中，我们提出了一个新的NUM框架，即学习，其中用户的实用程序功能是未知的APRIORI，并且只有在将相应的流量传递到目的地之后才能观察到交通速率的实用程序功能值，这意味着效用反馈体验\ TextIt \ TextIt \ textIt {queuing delay}。目的是设计一项策略，该政策逐渐学习实用程序功能，并制定费率分配和网络计划/路由决策，以最大程度地利用有限的时间范围内获得的总实用程序。除了未知的效用功能和随机约束外，我们问题的核心挑战在于观察结果的排队延迟，这可能是无限的，并且取决于策略的决策。我们首先表明，最佳动态策略获得的预期总效用是在静态优化问题的解决方案的上限。在没有反馈延迟的情况下，我们根据梯度估计和最大权重计划的思想设计算法。为了处理反馈延迟，我们将算法嵌入并行范围范式中，以形成一个实现$ \ tilde {o}（t^{3/4}）$ - 遗憾的策略，即，即，即最佳动态政策与我们的政策在$ \ tilde中获得的预期实用性之间的差异。最后，为了证明学习-NUM框架的实际适用性，我们将其应用于三种应用程序方案，包括数据库查询，作业计划和视频流。我们进一步对工作计划申请进行模拟，以评估政策的经验绩效。

Network Utility Maximization (NUM) studies the problems of allocating traffic rates to network users in order to maximize the users' total utility subject to network resource constraints. In this paper, we propose a new NUM framework, Learning-NUM, where the users' utility functions are unknown apriori and the utility function values of the traffic rates can be observed only after the corresponding traffic is delivered to the destination, which means that the utility feedback experiences \textit{queueing delay}. The goal is to design a policy that gradually learns the utility functions and makes rate allocation and network scheduling/routing decisions so as to maximize the total utility obtained over a finite time horizon $T$. In addition to unknown utility functions and stochastic constraints, a central challenge of our problem lies in the queueing delay of the observations, which may be unbounded and depends on the decisions of the policy. We first show that the expected total utility obtained by the best dynamic policy is upper bounded by the solution to a static optimization problem. Without the presence of feedback delay, we design an algorithm based on the ideas of gradient estimation and Max-Weight scheduling. To handle the feedback delay, we embed the algorithm in a parallel-instance paradigm to form a policy that achieves $\tilde{O}(T^{3/4})$-regret, i.e., the difference between the expected utility obtained by the best dynamic policy and our policy is in $\tilde{O}(T^{3/4})$. Finally, to demonstrate the practical applicability of the Learning-NUM framework, we apply it to three application scenarios including database query, job scheduling and video streaming. We further conduct simulations on the job scheduling application to evaluate the empirical performance of our policy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题