论文标题

观察性和介入的因果学习,以遗憾地将控制遗憾

Observational and Interventional Causal Learning for Regret-Minimizing Control

论文作者

Reiser, Christian

论文摘要

我们探讨了如何将观察性和介入的因果发现方法组合在一起。时间序列的最先进的观察性因果发现算法,能够处理潜在的混杂因素和同时的效应,称为LPCMCI,可从通过随机对照试验发现的随意约束中获利。数值结果表明,在完美的介入限制下,扩展LPCMCI的重建结构因果模型(SCM)允许84.6%的时间用于目标变量的最佳预测。介入和观察性因果发现的实施是模块化的,允许其他来源的因果约束。 本文的第二部分研究了通过同时学习因果模型的因果模型和计划行动来最大程度地减少控制控制的问题。这个想法是,优化测量变量的代理首先通过观察性因果发现学习系统的力学。然后,代理将最有希望的变量与随机值进行介入,以允许开发和生成新的介入数据。然后,代理使用介入数据进一步增强因果模型,从而可以改善下次的动作。 与原始LPCMCI算法相比,扩展的LPCMCI可以有利。数值结果表明,检测和使用介入约束会导致重建的SCM,与使用原始LPCMCI算法时的基线相比,与基线相比,目标变量的最佳预测允许60.9%的时间进行最佳预测。此外,诱导的平均遗憾从使用原始LPCMCI算法时从1.2降低至1.0,使用介入的扩展LPCMCI算法。

We explore how observational and interventional causal discovery methods can be combined. A state-of-the-art observational causal discovery algorithm for time series capable of handling latent confounders and contemporaneous effects, called LPCMCI, is extended to profit from casual constraints found through randomized control trials. Numerical results show that, given perfect interventional constraints, the reconstructed structural causal models (SCMs) of the extended LPCMCI allow 84.6% of the time for the optimal prediction of the target variable. The implementation of interventional and observational causal discovery is modular, allowing causal constraints from other sources. The second part of this thesis investigates the question of regret minimizing control by simultaneously learning a causal model and planning actions through the causal model. The idea is that an agent to optimize a measured variable first learns the system's mechanics through observational causal discovery. The agent then intervenes on the most promising variable with randomized values allowing for the exploitation and generation of new interventional data. The agent then uses the interventional data to enhance the causal model further, allowing improved actions the next time. The extended LPCMCI can be favorable compared to the original LPCMCI algorithm. The numerical results show that detecting and using interventional constraints leads to reconstructed SCMs that allow 60.9% of the time for the optimal prediction of the target variable in contrast to the baseline of 53.6% when using the original LPCMCI algorithm. Furthermore, the induced average regret decreases from 1.2 when using the original LPCMCI algorithm to 1.0 when using the extended LPCMCI algorithm with interventional discovery.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源