论文标题

用于拟合Q评估的超参数选择方法,并提供错误保证

Hyperparameter Selection Methods for Fitted Q-Evaluation with Error Guarantee

论文作者

Miyaguchi, Kohei

论文摘要

我们关注拟合Q评估(FQE)的超参数选择问题。 FQE是脱机政策评估(OPE)的最新方法之一,这对于没有环境模拟器的强化学习至关重要。但是,像其他OPE方法一样,FQE本身并不是没有高参数,这会破坏现实生活中的实用程序。我们通过为FQE提出一个近似高参数选择(AHS)的框架来解决此问题,该框架以定量且可解释的方式在没有超参数的情况下定义了最佳概念(称为选择标准)。然后,我们得出四种AHS方法,每个AHS方法具有不同的特征,例如分布不匹配的公差和时间复杂性。我们在实验中还证实,该理论给出的误差与经验观察匹配。

We are concerned with the problem of hyperparameter selection for the fitted Q-evaluation (FQE). FQE is one of the state-of-the-art method for offline policy evaluation (OPE), which is essential to the reinforcement learning without environment simulators. However, like other OPE methods, FQE is not hyperparameter-free itself and that undermines the utility in real-life applications. We address this issue by proposing a framework of approximate hyperparameter selection (AHS) for FQE, which defines a notion of optimality (called selection criteria) in a quantitative and interpretable manner without hyperparameters. We then derive four AHS methods each of which has different characteristics such as distribution-mismatch tolerance and time complexity. We also confirm in experiments that the error bound given by the theory matches empirical observations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源