论文标题
在生成模型中外推的独立机制的理论
A theory of independent mechanisms for extrapolation in generative models
论文作者
论文摘要
可以训练生成模型以模拟复杂的经验数据,但是在以前未观察到的环境的背景下进行预测有用吗?促进此类外推能力的直观想法是,该模型的体系结构反映了真实数据生成过程的因果图,以便一个人可以独立于其他节点进行干预。但是,该图的节点通常未观察到,导致过度参数化和因果结构缺乏可识别性。我们根据机制的独立性来确定可识别性的较弱形式,从而开发一个理论框架来解决这种挑战的情况。我们在玩具示例中证明,经典的随机梯度下降可能会阻碍该模型的外推能力,这表明在训练过程中应明确执行机制的独立性。对经过现实世界数据训练的深层生成模型的实验支持了这些见解,并说明了如何利用此类模型的外推能力。
Generative models can be trained to emulate complex empirical data, but are they useful to make predictions in the context of previously unobserved environments? An intuitive idea to promote such extrapolation capabilities is to have the architecture of such model reflect a causal graph of the true data generating process, such that one can intervene on each node independently of the others. However, the nodes of this graph are usually unobserved, leading to overparameterization and lack of identifiability of the causal structure. We develop a theoretical framework to address this challenging situation by defining a weaker form of identifiability, based on the principle of independence of mechanisms. We demonstrate on toy examples that classical stochastic gradient descent can hinder the model's extrapolation capabilities, suggesting independence of mechanisms should be enforced explicitly during training. Experiments on deep generative models trained on real world data support these insights and illustrate how the extrapolation capabilities of such models can be leveraged.