线性结构方程模型的因果匪

论文标题

线性结构方程模型的因果匪

Causal Bandits for Linear Structural Equation Models

论文作者

Varici, Burak, Shanmugam, Karthikeyan, Sattigeri, Prasanna, Tajer, Ali

论文摘要

本文研究了在因果图形模型中设计最佳干预措施序列的问题，以最大程度地减少事后最佳干预措施的累积后悔。自然，这是一个因果匪徒问题。重点是线性结构方程模型（SEM）和软干预措施的因果匪徒。假定该图的结构是已知的，并且具有$ n $节点。每个节点都假定使用两种线性机制，一种软干预和一种观察性，产生了$ 2^n $可能的干预措施。大多数现有的因果匪徒算法都认为，至少完全指定了奖励节点父母的介入分布。但是，有$ 2^n $这样的分布（一种与每个干预措施相对应），即使在中等尺寸的图中也变得越来越高。本文依靠知道这些分布或其边际的假设。提出了两种算法（基于UCB）和贝叶斯（基于汤普森采样）的设置。这些算法的关键思想是避免直接估计$ 2^n $奖励分布，而是估算完全指定SEMS（$ n $的线性）并使用它们来计算奖励的参数。在这两个算法中，在噪声和参数空间的有限假设下，累积遗憾的是$ \ tilde {\ cal o}（d^{l+\ frac {1} {1} {2}} {2}} \ sqrt {nt}）$，其中$ d $是$ d $是图形的最大程度和$ l $ less longes and its longes thang is longest ints longest and longes。此外，提出了$ω（d^{\ frac {l} {2} -2} \ sqrt {t}）$的最小值，这表明可实现的和下限符合其比例的缩放行为，相对于地平线$ t $ t $和图形参数$ d $ d $ d $和$ l $。

This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model to minimize cumulative regret with respect to the best intervention in hindsight. This is, naturally, posed as a causal bandit problem. The focus is on causal bandits for linear structural equation models (SEMs) and soft interventions. It is assumed that the graph's structure is known and has $N$ nodes. Two linear mechanisms, one soft intervention and one observational, are assumed for each node, giving rise to $2^N$ possible interventions. Majority of the existing causal bandit algorithms assume that at least the interventional distributions of the reward node's parents are fully specified. However, there are $2^N$ such distributions (one corresponding to each intervention), acquiring which becomes prohibitive even in moderate-sized graphs. This paper dispenses with the assumption of knowing these distributions or their marginals. Two algorithms are proposed for the frequentist (UCB-based) and Bayesian (Thompson Sampling-based) settings. The key idea of these algorithms is to avoid directly estimating the $2^N$ reward distributions and instead estimate the parameters that fully specify the SEMs (linear in $N$) and use them to compute the rewards. In both algorithms, under boundedness assumptions on noise and the parameter space, the cumulative regrets scale as $\tilde{\cal O} (d^{L+\frac{1}{2}} \sqrt{NT})$, where $d$ is the graph's maximum degree, and $L$ is the length of its longest causal path. Additionally, a minimax lower of $Ω(d^{\frac{L}{2}-2}\sqrt{T})$ is presented, which suggests that the achievable and lower bounds conform in their scaling behavior with respect to the horizon $T$ and graph parameters $d$ and $L$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题