在学习代理商中以干预为中心的因果推理

论文标题

在学习代理商中以干预为中心的因果推理

Towards intervention-centric causal reasoning in learning agents

论文作者

Lansdell, Benjamin

论文摘要

干预措施是因果学习和推理的核心。然而，最终进行干预是一种抽象：嵌入在物理环境中的代理（也许以马尔可夫决策过程为模型）通常不配备干预措施的概念 - 其动作空间通常以自我为中心，而没有“干预x”的作用。以自我为中心的动作和干预措施之间的对应关系对硬编码来说是具有挑战性的。相反，如果代理商了解了哪种动作序列使其能够对环境进行有针对性的操纵，并学到了允许从观察过程中学习的相应表示。在这里，我们展示了如何使用元学习方法在这种挑战性的环境下进行因果学习，在这种挑战性的环境中，动作空间不是一组干预措施，并且观察空间是具有潜在因果结构的高维空间。元强化学习算法用于学习转移观察性因果学习任务的关系。这项工作表明了深度强化学习和元学习的进步如何在具有潜在因果结构的高维环境中提供以干预为中心的因果学习。

Interventions are central to causal learning and reasoning. Yet ultimately an intervention is an abstraction: an agent embedded in a physical environment (perhaps modeled as a Markov decision process) does not typically come equipped with the notion of an intervention -- its action space is typically ego-centric, without actions of the form `intervene on X'. Such a correspondence between ego-centric actions and interventions would be challenging to hard-code. It would instead be better if an agent learnt which sequence of actions allow it to make targeted manipulations of the environment, and learnt corresponding representations that permitted learning from observation. Here we show how a meta-learning approach can be used to perform causal learning in this challenging setting, where the action-space is not a set of interventions and the observation space is a high-dimensional space with a latent causal structure. A meta-reinforcement learning algorithm is used to learn relationships that transfer on observational causal learning tasks. This work shows how advances in deep reinforcement learning and meta-learning can provide intervention-centric causal learning in high-dimensional environments with a latent causal structure.

下载PDF全文

下载文献需遵守相关版权规定

论文标题