通过切换成本进行专家校准的学习以进行在线优化

论文标题

通过切换成本进行专家校准的学习以进行在线优化

Expert-Calibrated Learning for Online Optimization with Switching Costs

论文作者

Li, Pengfei, Yang, Jianyi, Ren, Shaolei

论文摘要

我们研究了在线凸优化，并通过切换成本，这是由于缺乏完整的离线信息而实际上重要但又极具挑战性的问题。通过利用基于机器学习的功能（ML）优化器，可以在本文中获得ML的在线算法（也称为专家校准）已成为最新技术的状态，并具有可证明的最差性能保证。尽管如此，通过使用训练ML模型作为独立优化器的标准实践并将其插入ML功能增强的算法，平均成本性能可能是高度令人满意的。为了解决“如何学习”挑战，我们提出了EC-L2O（专家校准的学习以优化），该挑战通过明确考虑下游专家校准器来训练基于ML的优化器。为此，我们提出了一个新的可区分的专家校准器，该校准器概括了正规化的在线平衡下降，并且在预测错误很大时提供了比纯ML预测更好的竞争比。对于训练，我们的损失函数是两个不同损失的加权总和 - 一种将平均ML预测误差最小化，以获得更好的鲁棒性，另一个最小化了校准后平均成本。我们还为EC-L2O提供了理论分析，强调了专家校准对于平均成本绩效甚至可能是有益的，并且EC-L2O与离线最佳Oracle（即尾部成本比率）所达到的成本相比的高级尾巴比可以界定。最后，我们通过运行模拟来测试EC-L2O，以实现可持续的数据中心需求响应。我们的结果表明，与现有基线算法相比，EC-L2O在经验上可以实现较低的平均成本和竞争比率较低。

We study online convex optimization with switching costs, a practically important but also extremely challenging problem due to the lack of complete offline information. By tapping into the power of machine learning (ML) based optimizers, ML-augmented online algorithms (also referred to as expert calibration in this paper) have been emerging as state of the art, with provable worst-case performance guarantees. Nonetheless, by using the standard practice of training an ML model as a standalone optimizer and plugging it into an ML-augmented algorithm, the average cost performance can be highly unsatisfactory. In order to address the "how to learn" challenge, we propose EC-L2O (expert-calibrated learning to optimize), which trains an ML-based optimizer by explicitly taking into account the downstream expert calibrator. To accomplish this, we propose a new differentiable expert calibrator that generalizes regularized online balanced descent and offers a provably better competitive ratio than pure ML predictions when the prediction error is large. For training, our loss function is a weighted sum of two different losses -- one minimizing the average ML prediction error for better robustness, and the other one minimizing the post-calibration average cost. We also provide theoretical analysis for EC-L2O, highlighting that expert calibration can be even beneficial for the average cost performance and that the high-percentile tail ratio of the cost achieved by EC-L2O to that of the offline optimal oracle (i.e., tail cost ratio) can be bounded. Finally, we test EC-L2O by running simulations for sustainable datacenter demand response. Our results demonstrate that EC-L2O can empirically achieve a lower average cost as well as a lower competitive ratio than the existing baseline algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题