论文标题
通过深度加固学习优化顺序实验设计
Optimizing Sequential Experimental Design with Deep Reinforcement Learning
论文作者
论文摘要
开发了用于解决顺序实验的最佳设计的贝叶斯方法在数学上是优雅的,但在计算上具有挑战性。最近,已经提出了使用摊销的技术来使这些贝叶斯方法实用,通过训练一种参数化的政策,该政策在部署时有效地设计了设计。但是,这些方法可能无法充分探索设计空间,需要访问可区分的概率模型,并且只能在连续的设计空间上进行优化。在这里,我们通过表明优化策略的问题可以减少到解决马尔可夫决策过程(MDP)来解决这些局限性。我们使用现代深度加强学习技术来解决同等的MDP。我们的实验表明,即使概率模型是黑匣子,我们的方法在部署时间也很有效,并且在连续和离散的设计空间上都表现出最先进的性能。
Bayesian approaches developed to solve the optimal design of sequential experiments are mathematically elegant but computationally challenging. Recently, techniques using amortization have been proposed to make these Bayesian approaches practical, by training a parameterized policy that proposes designs efficiently at deployment time. However, these methods may not sufficiently explore the design space, require access to a differentiable probabilistic model and can only optimize over continuous design spaces. Here, we address these limitations by showing that the problem of optimizing policies can be reduced to solving a Markov decision process (MDP). We solve the equivalent MDP with modern deep reinforcement learning techniques. Our experiments show that our approach is also computationally efficient at deployment time and exhibits state-of-the-art performance on both continuous and discrete design spaces, even when the probabilistic model is a black box.