在线凸优化的视角，用于从动态揭示的偏好中学习

论文标题

在线凸优化的视角，用于从动态揭示的偏好中学习

Online Convex Optimization Perspective for Learning from Dynamically Revealed Preferences

论文作者

Chen, Violet Xinying, Kılınç-Karzan, Fatma

论文摘要

我们从揭示的偏好中研究在线学习的问题（OL）：一个学习者希望通过观察代理商在不断变化的环境中观察代理商的效用最大化动作来学习非战略代理的私人效用功能。我们采用在线反优化设置，其中学习者以在线方式观察代理商的行动，并且学习绩效是通过与损失功能相关的后悔来衡量的。我们首先表征了特殊但广泛的代理的实用程序功能，然后在设计新的凸损耗函数时利用此结构。我们确定，关于我们的新损失功能的遗憾也使遗憾与文献中所有其他常规损失功能有关。这使我们能够设计一个灵活的OL框架，该框架能够对损失功能进行统一的处理，并支持各种在线凸优化算法。我们用理论和经验证据证明，基于新的损失函数（特别是在线镜像下降）的框架在遗憾绩效和解决方案时间上比文献的其他OL算法具有显着优势，并绕过了先前的技术假设。

We study the problem of online learning (OL) from revealed preferences: a learner wishes to learn a non-strategic agent's private utility function through observing the agent's utility-maximizing actions in a changing environment. We adopt an online inverse optimization setup, where the learner observes a stream of agent's actions in an online fashion and the learning performance is measured by regret associated with a loss function. We first characterize a special but broad class of agent's utility functions, then utilize this structure in designing a new convex loss function. We establish that the regret with respect to our new loss function also bounds the regret with respect to all other usual loss functions in the literature. This allows us to design a flexible OL framework that enables a unified treatment of loss functions and supports a variety of online convex optimization algorithms. We demonstrate with theoretical and empirical evidence that our framework based on the new loss function (in particular online Mirror Descent) has significant advantages in terms of regret performance and solution time over other OL algorithms from the literature and bypasses the previous technical assumptions as well.

下载PDF全文

下载文献需遵守相关版权规定

论文标题