论文标题
通过切换成本进行专家校准的学习以进行在线优化
Expert-Calibrated Learning for Online Optimization with Switching Costs
论文作者
论文摘要
我们研究了在线凸优化,并通过切换成本,这是由于缺乏完整的离线信息而实际上重要但又极具挑战性的问题。通过利用基于机器学习的功能(ML)优化器,可以在本文中获得ML的在线算法(也称为专家校准)已成为最新技术的状态,并具有可证明的最差性能保证。尽管如此,通过使用训练ML模型作为独立优化器的标准实践并将其插入ML功能增强的算法,平均成本性能可能是高度令人满意的。为了解决“如何学习”挑战,我们提出了EC-L2O(专家校准的学习以优化),该挑战通过明确考虑下游专家校准器来训练基于ML的优化器。为此,我们提出了一个新的可区分的专家校准器,该校准器概括了正规化的在线平衡下降,并且在预测错误很大时提供了比纯ML预测更好的竞争比。对于训练,我们的损失函数是两个不同损失的加权总和 - 一种将平均ML预测误差最小化,以获得更好的鲁棒性,另一个最小化了校准后平均成本。我们还为EC-L2O提供了理论分析,强调了专家校准对于平均成本绩效甚至可能是有益的,并且EC-L2O与离线最佳Oracle(即尾部成本比率)所达到的成本相比的高级尾巴比可以界定。最后,我们通过运行模拟来测试EC-L2O,以实现可持续的数据中心需求响应。我们的结果表明,与现有基线算法相比,EC-L2O在经验上可以实现较低的平均成本和竞争比率较低。
We study online convex optimization with switching costs, a practically important but also extremely challenging problem due to the lack of complete offline information. By tapping into the power of machine learning (ML) based optimizers, ML-augmented online algorithms (also referred to as expert calibration in this paper) have been emerging as state of the art, with provable worst-case performance guarantees. Nonetheless, by using the standard practice of training an ML model as a standalone optimizer and plugging it into an ML-augmented algorithm, the average cost performance can be highly unsatisfactory. In order to address the "how to learn" challenge, we propose EC-L2O (expert-calibrated learning to optimize), which trains an ML-based optimizer by explicitly taking into account the downstream expert calibrator. To accomplish this, we propose a new differentiable expert calibrator that generalizes regularized online balanced descent and offers a provably better competitive ratio than pure ML predictions when the prediction error is large. For training, our loss function is a weighted sum of two different losses -- one minimizing the average ML prediction error for better robustness, and the other one minimizing the post-calibration average cost. We also provide theoretical analysis for EC-L2O, highlighting that expert calibration can be even beneficial for the average cost performance and that the high-percentile tail ratio of the cost achieved by EC-L2O to that of the offline optimal oracle (i.e., tail cost ratio) can be bounded. Finally, we test EC-L2O by running simulations for sustainable datacenter demand response. Our results demonstrate that EC-L2O can empirically achieve a lower average cost as well as a lower competitive ratio than the existing baseline algorithms.