从因果的角度概括非政策评估以进行顺序决策

论文标题

从因果的角度概括非政策评估以进行顺序决策

Generalizing Off-Policy Evaluation From a Causal Perspective For Sequential Decision-Making

论文作者

Parbhoo, Sonali, Joshi, Shalmali, Doshi-Velez, Finale

论文摘要

在几个高级决策领域中评估基于不同策略的观察数据的策略的效果是一个常见问题，并且已经提出了几种非政策评估（OPE）技术。但是，这些方法在很大程度上将OPE提出为与用于生成数据的过程（即以因果图形式的结构假设）分离的问题。我们认为，明确强调这种关联对我们对OPE基本限制的理解具有重要意义。首先，这意味着当前的OPE表述对应于一组狭窄的任务，即特定的因果估计，该估计重点是对人群或子人群的策略预期评估。其次，我们证明了这种关联如何激励自然的逃避估计，特别是在人群中延长了OPE对反事实外评估的作用。因果估计和重点的精确描述，这些估计值可以从所述生成假设下的观察数据中识别出来。对于那些无法识别的OPE估计值，因果观点进一步突出了需要更多实验数据的地方，并突出了人类专业知识可以帮助识别和估计的情况。此外，OPE的许多形式主义完全忽略了不确定性在估计过程中的作用。我们证明了因果关系估计的特殊性，并突出了不同的不确定性来源以及人类专业知识何时可以自然管理这种不确定性。我们将这些方面的每个方面都视为可行的Desiderata，以供将来的OPE研究，并与实用性融为一体。

Assessing the effects of a policy based on observational data from a different policy is a common problem across several high-stake decision-making domains, and several off-policy evaluation (OPE) techniques have been proposed. However, these methods largely formulate OPE as a problem disassociated from the process used to generate the data (i.e. structural assumptions in the form of a causal graph). We argue that explicitly highlighting this association has important implications on our understanding of the fundamental limits of OPE. First, this implies that current formulation of OPE corresponds to a narrow set of tasks, i.e. a specific causal estimand which is focused on prospective evaluation of policies over populations or sub-populations. Second, we demonstrate how this association motivates natural desiderata to consider a general set of causal estimands, particularly extending the role of OPE for counterfactual off-policy evaluation at the level of individuals of the population. A precise description of the causal estimand highlights which OPE estimands are identifiable from observational data under the stated generative assumptions. For those OPE estimands that are not identifiable, the causal perspective further highlights where more experimental data is necessary, and highlights situations where human expertise can aid identification and estimation. Furthermore, many formalisms of OPE overlook the role of uncertainty entirely in the estimation process.We demonstrate how specifically characterising the causal estimand highlights the different sources of uncertainty and when human expertise can naturally manage this uncertainty. We discuss each of these aspects as actionable desiderata for future OPE research at scale and in-line with practical utility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题